Mechanistic Basis of Junk Data–Induced Cognitive Decline in LLMs

Determine how continual pre-training (next-token prediction) on popular Twitter/X posts selected for high engagement and short length (metric M1) and on other junk web text characterized by low semantic quality and sensationalist style (metric M2) changes the learning mechanism of large language models in a way that results in persistent cognitive declines in performance.

Background

The paper introduces the LLM Brain Rot Hypothesis and presents controlled experiments showing that continual pre-training on junk web text induces declines in reasoning, long-context understanding, safety, and personality measures across multiple LLMs. Junk data are operationalized via two orthogonal criteria: engagement-driven popularity combined with short length (M1) and semantically low-quality, attention-grabbing content (M2).

Analyses identify thought-skipping as a primary failure mode and show that declines persist despite post-hoc instruction tuning or additional clean data. While the empirical effects are documented, the specific way in which such data alters the learning mechanism of LLMs to produce these declines remains unresolved, which the authors explicitly leave as an open question.

References

Limited by the scope of the paper, we leave it as an open question how popular tweets or other junk data change the learning mechanism, resulting in cognitive declines. Answering the question is essential for building stronger defense methods in the future.

— LLMs Can Get "Brain Rot"! (2510.13928 - Xing et al., 15 Oct 2025) in Conclusion, final paragraph

Mechanistic Basis of Junk Data–Induced Cognitive Decline in LLMs

Background

References

Related Problems