The Brave browser project shows that it was ahead of the curve back in late 2023, pointing out:
So "AI projects should look for more accessible, better curated, high-quality data sets that fit their needs", hence Brave's approach:
A key factor (for me at least), is that "Brave’s index is much more representative of the Web people actually care about" because of their related "Web Discovery Project mechanism (which allows real users to contribute anonymous data about the pages they’re actually visiting)... [so our] index represent the 99% of the Web that people actually want to visit... a filtered search index ... curated by its millions of actual users".
The article then looks at Brave's (early) "AI Summarizer, which returns AI-powered, contextual answers at the top of the search results page...[and] cites its sources".
The article then explores how Brave Search index can be used both for "assembling a data set to train AI models ... [and] To help at the time of inference... via RAG (basically when a model retrieves new information that it wasn’t originally trained on)"
More Stuff I Like
More Stuff tagged ai , llm , rag , ai4communities , brave
See also: Digital Transformation , Innovation Strategy , Science&Technology , Large language models
MyHub.ai saves very few cookies onto your device: we need some to monitor site traffic using Google Analytics, while another protects you from a cross-site request forgeries. Nevertheless, you can disable the usage of cookies by changing the settings of your browser. By browsing our website without changing the browser settings, you grant us permission to store that information on your device. More details in our Privacy Policy.