unearth.wiki

Model Collapse by Contamination

/ˈmɒd.əl kəˈlæps baɪ kənˌtæm.ɪˈneɪ.ʃən/ Computer Science / Data Ecology. First theorized in the early 2020s.
Definition A recursive logic failure in generative AI systems in which models are continuously trained on the synthetic artifacts generated by previous models. This closed feedback loop systematically excludes novel human insight, causing the models to rapidly degrade in structural coherence, logic, and output quality over time.

The Ouroboros of Synthetic Data

In the primary era of generative AI implementation, models were trained overwhelmingly on the accumulated history of human culture—the "Organic Baseline." As synthetic content flooded the internet during the onset of the Synthetocene, this pure baseline became increasingly inaccessible. Newer models thus began scraping the internet and ingesting data that was already mathematically predicted by previous models.

Model Collapse by Contamination occurs when the "tails" of a statistical distribution—the quirks, the profound insights, the human errors, and the true novelties—are smoothed out by successive synthetic generations. The AI forgets what extreme probability looks like and converges on a hyper-average sludge. The model, effectively, eats its own tail.

Field Note: To mitigate Model Collapse, organizations are forced to seek out "uncontaminated" human data. Analog physical media, pre-2022 digital archives, and verified human interaction (Autogravitas) suddenly acquire immense value as the only remaining sources of organic ground truth capable of stabilizing failing algorithms.
Primary Source Jefferson, J., & Velasco, F. (2025). The Slow Sedimentation: On the Beginning of the End of the Feed Economy. Unearth Heritage Foundry.
Stratigraphy (Related Concepts)
Synthetocene Organic Baseline Autogravitas Adjacent Possible