CrackitToday App

Model Collapse

Model Collapse:

Researchers recently showed that adding just one outside data point or prior knowledge to training can reliably prevent collapse in tested models.

  • Model collapse is what happens when AI models are trained on data that includes content generated by earlier versions of themselves, known as synthetic data or model-generated data.
  • Over time, this recursive process causes the models to drift further away from the original data distribution, losing the ability to accurately represent the world as it really is.
  • This means that large language models (LLMs) and other complex AI systems are increasingly ingesting generated data that is statistically simpler than the human-generated data on which they were originally built, leading to irreversible defects in future models.
  • Instead of improving, the AI starts to make mistakes that compound over generations, leading to outputs that are increasingly distorted and unreliable.
  • This takes place because any errors present in one model’s output during its fitting are later included in the training of its successor.
  • AI Model Collapse Can Cause:
    • Limited creativity: Collapsed models can’t truly innovate or push boundaries in their respective fields.
    • Stagnation of AI development: If models consistently default to “safe” responses, it can hinder meaningful progress in AI capabilities.
    • Missed opportunities: Model collapse could make AIs less capable of tackling real-world problems that require nuanced understanding and flexible solutions.
    • Perpetuation of biases: Since model collapse often results from biases in training data, it risks reinforcing existing stereotypes and unfairness.
    • Some solutions include tracking data provenance, preserving access to original data sources, and combining accumulated AI-generated data with real data to train AI models.