Training AI on AI output makes the rare cases disappear

fact

In the 2024 Nature paper “AI models collapse when trained on recursively generated data,” Ilia Shumailov and colleagues showed that when generative models are trained on data produced by earlier models, “indiscriminate use of model-generated content in training causes irreversible defects in the resulting models, in which tails of the original content distribution disappear.” Rare patterns get sampled less with each generation until the model converges toward a narrow, low-variety output. The authors demonstrated the effect across large language models, variational autoencoders, and Gaussian mixture models, suggesting it is a general hazard of recursive training rather than a quirk of any one architecture.

Sources

PRIMARY https://www.nature.com/articles/s41586-024-07566-y

Last verified June 7, 2026

<- Back to the AI Library

Training AI on AI output makes the rare cases disappear

Sources

Related