What happens to generative AI once they themselves "contribute much of the text found online...? indiscriminate use of model-generated content in training causes irreversible defects ... tails of the original content distribution disappear. We refer to this effect as ‘model collapse’ ... a degenerative process ... models forget the true underlying data distribution"
Implications?
The full scientific paper is of course long and dense. Some key points from a high-level scan:
Interestingly: in a sense, this is not new - for example, "click, content and troll farms [are] a form of human ‘language models’... to misguide social networks and search algorithms." What's new is the scale allowed by LLM-driven poisoning attacks of LLMs - a self-poisoning Ourobouros.
And because it's difficult to differentiate LLM-generated content on the web, today's genAI companies have a ‘first mover advantage’. A newcomer would not have access to "pure" training content, unless OpenAI et al were forced to share.
More Stuff I Like
More Stuff tagged ai , nature magazine , ai4communities , ourobouros , model collapse
See also: Digital Transformation , Innovation Strategy , Science&Technology
MyHub.ai saves very few cookies onto your device: we need some to monitor site traffic using Google Analytics, while another protects you from a cross-site request forgeries. Nevertheless, you can disable the usage of cookies by changing the settings of your browser. By browsing our website without changing the browser settings, you grant us permission to store that information on your device. More details in our Privacy Policy.