Model Collapse
A phenomenon where models trained on AI-generated data progressively lose quality and diversity over generations, eventually producing degenerate outputs.
How It Happens
When AI-generated content becomes a significant portion of training data for the next generation of models, rare patterns and minority representations are gradually lost. Each generation amplifies common patterns and suppresses rare ones.
Implications
As the internet fills with AI-generated text, future LLMs risk training on their predecessors' outputs. This creates a feedback loop that degrades quality. Mitigation requires maintaining access to high-quality human-generated data.