Mechanistic Basis of Junk Data–Induced Cognitive Decline in LLMs
Determine how continual pre-training (next-token prediction) on popular Twitter/X posts selected for high engagement and short length (metric M1) and on other junk web text characterized by low semantic quality and sensationalist style (metric M2) changes the learning mechanism of large language models in a way that results in persistent cognitive declines in performance.
References
Limited by the scope of the paper, we leave it as an open question how popular tweets or other junk data change the learning mechanism, resulting in cognitive declines. Answering the question is essential for building stronger defense methods in the future.
— LLMs Can Get "Brain Rot"!
(2510.13928 - Xing et al., 15 Oct 2025) in Conclusion, final paragraph