Model Collapse

Model Collapse:

Researchers recently showed that adding just one outside data point or prior knowledge to training can reliably prevent collapse in tested models.

Model collapse is what happens when AI models are trained on data that includes content generated by earlier versions of themselves, known as synthetic data or model-generated data.
Over time, this recursive process causes the models to drift further away from the original data distribution, losing the ability to accurately represent the world as it really is.
This means that large language models (LLMs) and other complex AI systems are increasingly ingesting generated data that is statistically simpler than the human-generated data on which they were originally built, leading to irreversible defects in future models.
Instead of improving, the AI starts to make mistakes that compound over generations, leading to outputs that are increasingly distorted and unreliable.
This takes place because any errors present in one model’s output during its fitting are later included in the training of its successor.
AI Model Collapse Can Cause:
- Limited creativity: Collapsed models can’t truly innovate or push boundaries in their respective fields.
- Stagnation of AI development: If models consistently default to “safe” responses, it can hinder meaningful progress in AI capabilities.
- Missed opportunities: Model collapse could make AIs less capable of tackling real-world problems that require nuanced understanding and flexible solutions.
- Perpetuation of biases: Since model collapse often results from biases in training data, it risks reinforcing existing stereotypes and unfairness.
- Some solutions include tracking data provenance, preserving access to original data sources, and combining accumulated AI-generated data with real data to train AI models.

S	M	T	W	T	F	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Model Collapse:

Important Links

Important External Links

Apps