Papers
Topics
Authors
Recent
2000 character limit reached

LLM Web Dynamics: Tracing Model Collapse in a Network of LLMs (2506.15690v2)

Published 26 May 2025 in cs.LG, cs.AI, cs.SI, and stat.ME

Abstract: The increasing use of synthetic data from the public Internet has enhanced data usage efficiency in LLM training. However, the potential threat of model collapse remains insufficiently explored. Existing studies primarily examine model collapse in a single model setting or rely solely on statistical surrogates. In this work, we introduce LLM Web Dynamics (LWD), an efficient framework for investigating model collapse at the network level. By simulating the Internet with a retrieval-augmented generation (RAG) database, we analyze the convergence pattern of model outputs. Furthermore, we provide theoretical guarantees for this convergence by drawing an analogy to interacting Gaussian Mixture Models.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Explain it Like I'm 14

Overview

This paper studies a problem called “model collapse” in LLMs. Model collapse happens when AI systems keep learning from text that they (or other AIs) wrote before, instead of from fresh human-written text. Over time, their answers become more and more similar, less diverse, and can lose important details. The authors build a simple, low-cost way to watch this happen in a network of different LLMs that “talk” to each other, and they also provide a mathematical model to explain why this happens.

Objectives

Here’s what the paper tries to do:

  1. Create a realistic mini-Internet where several LLMs chat, post answers, and then read those posts later.
  2. Measure how similar the models’ answers become over time, using a simple number that represents “how far apart” their answers are.
  3. Use a math-based toy model (called a Gaussian Mixture Model, or GMM) to explain and predict the collapse, giving more solid theoretical support.

Methods and Approach

The authors use two main setups: a network of LLMs and a math-based simulator.

The LLM network (like a group chat that shapes the Internet)

  • Imagine 3 different AI models (from different companies and countries) discussing the same question over and over.
  • There is a shared “Internet” — a text database — that starts with human-written posts on a topic (in the paper: Bitcoin’s future).
  • Each time the models answer the question, they first “look up” information from this Internet using RAG (Retrieval-Augmented Generation).
    • RAG is like checking a library before answering. Instead of changing the model’s inner settings, we simply give it relevant notes and sources to read.
  • After answering, each model “posts” one of its answers back to the Internet.
  • Over time, the Internet fills up with AI-written posts, so the models end up reading mostly their own or other models’ past answers. This creates a feedback loop.

How similarity is measured:

  • Each answer is turned into a list of numbers (an “embedding”), which captures its meaning. Think of it like mapping each sentence to a point on a big coordinate grid.
  • The authors compute how far apart these points are for different models. If the points get closer, it means the models’ answers are getting more similar.
  • They summarize all pairwise distances into a single number (the “Frobenius norm” of a distance matrix). You can think of it as the overall “spread” of differences. Smaller means more similar, larger means more diverse.

The math simulator (GMM, like mixing flavors)

  • A Gaussian Mixture Model (GMM) is a simple way to represent a distribution as a mix of a few “basic shapes” (Gaussians).
  • Imagine making a smoothie with several flavors; the “mixture weights” say how much of each flavor you used.
  • The authors set up several GMMs (standing in for different LLMs) that:
    • Sample points (answers) from their current mix,
    • Add those points to a shared pool (like the Internet),
    • Then update their mix based on what they sample from the shared pool next time.
  • Because they all keep sampling from the same growing pool of their own past outputs, the GMMs’ mixture weights gradually become the same.
  • In math terms, they show that the overall “distance” between models (based on these weights) tends toward zero.

Why use GMMs?

  • They’re much cheaper to simulate than real LLMs.
  • They help prove, in a clean way, that the collapse really should happen under these conditions.

Main Findings and Why They Matter

What they observed with real LLMs:

  • They ran the LLM setup for many rounds using three different models (Llama, DeepSeek, and Mistral).
  • The shared Internet started with 20 human-written posts about Bitcoin.
  • Over time, the models’ answers became clearly more similar. At the beginning, each model had a distinct style and opinion. By the end, they were nearly identical in what they said.
  • The single “similarity number” (the norm) dropped steeply as time went on, showing reduced diversity across models.

What they observed with the GMMs:

  • The same pattern appeared: the “distance” between models shrank toward zero in repeated simulations.
  • With more components (more “flavors”), the curve looked more like the LLM results: smoother at first, then drops, hinting at phases of faster convergence.

Why this is important:

  • It shows that when AIs keep learning from AI data (instead of human data), they start sounding the same.
  • This reduces creativity and can hide unusual or rare facts (low-density information).
  • It also provides a simple, measurable way to track collapse without retraining massive models, which is very expensive.

Implications and Potential Impact

What this means for the future:

  • The framework (LLM Web Dynamics, or LWD) acts like a safe, low-cost testbed to study how AIs might evolve on an Internet filled with AI-written text.
  • It can help researchers and developers tune how much real data vs. synthetic data to use so models stay diverse and accurate.
  • The GMM proxy offers theoretical guarantees and fast simulations, so teams can explore policies (like how much to retrieve, how often to mix in real text) before spending lots of money on big experiments.
  • The paper suggests caution: collapse isn’t always “bad” in a general sense — it means the system stabilizes. But if your goal is rich, varied answers or preserving rare information, collapse can hurt performance.
  • The authors also point out limitations: the approach measures patterns well but needs stronger statistical testing and controls (like exact ratios of real-to-synthetic data) to guide practical training rules.

In short, this work provides an easy-to-run, well-measured way to see how a network of LLMs can drift into sameness when they learn mostly from their own past outputs—and offers a math-backed tool to understand and prevent it.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.