2000 character limit reached
Collapse of Self-trained Language Models (2404.02305v1)
Published 2 Apr 2024 in cs.CL and cs.AI
Abstract: In various fields of knowledge creation, including science, new ideas often build on pre-existing information. In this work, we explore this concept within the context of LLMs. Specifically, we explore the potential of self-training models on their own outputs, akin to how humans learn and build on their previous thoughts and actions. While this approach is intuitively appealing, our research reveals its practical limitations. We find that extended self-training of the GPT-2 model leads to a significant degradation in performance, resulting in repetitive and collapsed token output.
- Gradient amplification: An efficient way to train deep neural networks. Big Data Mining and Analytics, 3(3):196–207, 2020. doi: 10.26599/BDMA.2020.9020004.
- Variational approximation of long-span language models for lvcsr. pp. 5532–5535, 05 2011.
- A dynamic language model for speech recognition. In Speech and Natural Language: Proceedings of a Workshop Held at Pacific Grove, California, February 19-22, 1991, 1991. URL https://aclanthology.org/H91-1057.
- Dynamic evaluation of neural sequence models. CoRR, abs/1709.07432, 2017. URL http://arxiv.org/abs/1709.07432.
- Dynamic evaluation of transformer language models. CoRR, abs/1904.08378, 2019. URL http://arxiv.org/abs/1904.08378.
- Pointer sentinel mixture models. CoRR, abs/1609.07843, 2016. URL http://arxiv.org/abs/1609.07843.
- Recurrent neural network based language model. volume 2, pp. 1045–1048, 01 2010.
- Extensions of recurrent neural network language model. pp. 5528 – 5531, 06 2011. doi: 10.1109/ICASSP.2011.5947611.
- Tomáš Mikolov. STATISTICAL LANGUAGE MODELS BASED ON NEURAL NETWORKS. Ph.d. thesis, Brno University of Technology, Faculty of Information Technology, 2012. URL https://www.fit.vut.cz/study/phd-thesis/283/.
- Language models are unsupervised multitask learners. 2019. URL https://api.semanticscholar.org/CorpusID:160025533.
- The curse of recursion: Training on generated data makes models forget, 2023.