AI trained on AI churns out gibberish garbage

Eventually, it collapses—'poisoned with its own projection of reality.’
Historical artistic interpretation of an ouroboros. It is a dragon eating its own tail.
Ouroboros. Credit: API/Gamma-Rapho via Getty Images

Share

Large language models like those offered by OpenAI and Google famously require vast troves of training data to work. The latest versions of these models have already scoured much of the existing internet which has led some to fear there may not be enough new data left to train future iterations. Some prominent voices in the industry, like Meta CEO Mark Zuckerberg have posited a solution to that data dilemma: simply train new AI systems on old AI outputs. 

But new research suggests that cannibalizing of past model outputs would quickly result in strings of babbling AI gibberish and could eventually lead to what’s being called “model collapse.” In one example, researchers fed an AI a benign paragraph about church architecture only to have it rapidly degrade over generations. The final, most “advanced” model simply repeated the phrase “black@tailed jackrabbits” continuously.

A study published in Nature this week put that AI-trained-on-AI scenario to the test. The researchers made their own language model which they initially fed original, human-generated text. They then made nine more generations of models, each trained on the text output generated by the model before it. The end result in the final generation was nonessential surrealist-sounding gibberish that had essentially nothing to do with the original text. Over time and successive generations, the researchers say their model “becomes poisoned with its own projection of reality.” 

AI models forget meaning the more they trains on themselves 

The researchers refer to this odd case of AI seemingly imploding on itself as “model collapse,” a degenerative process that can present itself in early and late stage forms. On the early side of things, collapse begins to occur when AI models several generations removed from the original training data seemingly forgets outliers, or rarities in the original text. This has the effect of making the most likely outputs more and more common. That would be an issue in the real world, because it could result in a whittling down of minority views or expression. An LLM showing signs of early collapse could present a version of reality that lacks diversity and suffers from an overwhelming sameness.

Things get weirder in the later stages of collapse. In those last generations, the models trained on models are so far removed from the original training data that they begin to forget key aspects of the initial training and lose the plot entirely. It’s at this stage that models begin generating complete meaningless gibberish. When this happens, the researchers say the model’s “indiscriminate” self-cannibalizing of its own previous outputs “causes irreversible defects in the resulting model.” 

The researchers claim this cascading effect and eventual model collapse are inevitable for large models trained on their own data. It’s important to note this research focused specifically on language models and does not weigh on what could happen if multimodal models like image and video generators were trained on themselves. This research also zeroes in on what should happen on a model training on its own data. It’s unclear exactly what would happen if one model, say from Meta, were to train on output generated from OpenAI. 

Preserving original human text could stave off collapse 

The prospect of real-world model collapse isn’t an unthinkable hypothetical. Right now, countless websites are up and running featuring articles and blog posts entirely generated by LLMs. In the race to build new models as fast as possible, it’s not unthinkable that much of that AI-generated slop could wind up seeping its way into training sets.

One possible solution to that inadvertently including AI generated content into training sets would be to encourage a watermarking standard across platforms that clearly marks the authenticity of content and whether or not it was produced by a machine. Google, Adobe, and big tech players are trying to do just that with a special “content credential” badge they are trying to standardize as part of the The Coalition for Content Provenance and Authenticity (C2PA).

But that would only apply to images. AI-generated text is also much more difficult to feasibly watermark or even accurately identify using available detection software. A more realistic approach may require AI developers to scrupulously vet material for signs of AI manipulation, and potentially pay reputable human sources for access to train on their high quality data. Without those safeguards of human training data, the internet risks being flooded by a wave of AI vomit. Nobody wants that.