Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs (2309.07311v6)
Abstract: Most interpretability research in NLP focuses on understanding the behavior and features of a fully trained model. However, certain insights into model behavior may only be accessible by observing the trajectory of the training process. We present a case study of syntax acquisition in masked LLMs (MLMs) that demonstrates how analyzing the evolution of interpretable artifacts throughout training deepens our understanding of emergent behavior. In particular, we study Syntactic Attention Structure (SAS), a naturally emerging property of MLMs wherein specific Transformer heads tend to focus on specific syntactic relations. We identify a brief window in pretraining when models abruptly acquire SAS, concurrent with a steep drop in loss. This breakthrough precipitates the subsequent acquisition of linguistic capabilities. We then examine the causal role of SAS by manipulating SAS during training, and demonstrate that SAS is necessary for the development of grammatical capabilities. We further find that SAS competes with other beneficial traits during training, and that briefly suppressing SAS improves model quality. These findings offer an interpretation of a real-world example of both simplicity bias and breakthrough training dynamics.
- Quantifying attention flow in transformers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4190–4197, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.385. URL https://aclanthology.org/2020.acl-main.385.
- Critical learning periods in deep networks. In International Conference on Learning Representations, 2018.
- Intrinsic dimension of data representations in deep neural networks. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/cfcce0621b49c983991ead4c3d4d3b6b-Paper.pdf.
- Machine learning phase transitions: Connections to the fisher information, 2023.
- A Closer Look at Memorization in Deep Networks. arXiv:1706.05394 [cs, stat], June 2017. URL http://arxiv.org/abs/1706.05394. arXiv: 1706.05394.
- Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit, July 2022. URL http://arxiv.org/abs/2207.08799. arXiv:2207.08799 [cs, math, stat].
- Reverse engineering self-supervised learning. arXiv preprint arXiv:2305.15614, 2023.
- Is Attention Explanation? An Introduction to the Debate. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 3889–3900, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.269. URL https://aclanthology.org/2022.acl-long.269.
- Broken neural scaling laws. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=sckjveqlCZ.
- Pretrained Language Model Embryology: The Birth of ALBERT. arXiv:2010.02480 [cs], October 2020. URL http://arxiv.org/abs/2010.02480. arXiv: 2010.02480.
- Ted Chiang. ChatGPT Is a Blurry JPEG of the Web. The New Yorker, February 2023. ISSN 0028-792X. URL https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web. Section: annals of artificial intelligence.
- Kyunghyun Cho. Are JPEG and LM similar to each other? If so, in what sense, and is this the real question to ask? https://kyunghyuncho.me/are-jpeg-and-lm-similar-to-each-other-if-so-in-what-sense-and-is-this-the-real-question-to-ask/, December 2023.
- The Grammar-Learning Trajectories of Neural Language Models. arXiv:2109.06096 [cs], March 2022. URL http://arxiv.org/abs/2109.06096. arXiv: 2109.06096.
- What Does BERT Look At? An Analysis of BERT’s Attention, June 2019. URL http://arxiv.org/abs/1906.04341. arXiv:1906.04341 [cs].
- The paradox of the compositionality of natural language: A neural machine translation case study. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 4154–4175, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.286. URL https://aclanthology.org/2022.acl-long.286.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1423. URL https://aclanthology.org/N19-1423.
- Attention flows are shapley value explanations. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 49–54, Online, August 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-short.8. URL https://aclanthology.org/2021.acl-short.8.
- Estimating the intrinsic dimension of datasets by a minimal neighborhood information. Scientific Reports, 7(1), sep 2017. doi: 10.1038/s41598-017-11873-y. URL https://doi.org/10.1038%2Fs41598-017-11873-y.
- The scientific method in the science of machine learning. CoRR, abs/1904.10922, 2019. URL http://arxiv.org/abs/1904.10922.
- Wikimedia Foundation. Wikimedia downloads, 2022. URL https://dumps.wikimedia.org.
- What shapes feature representations? Exploring datasets, architectures, and training. arXiv:2006.12433 [cs, stat], October 2020. URL http://arxiv.org/abs/2006.12433. arXiv: 2006.12433.
- spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. To appear, 2017.
- Latent state models of training dynamics, 2023.
- Attention is not Explanation. arXiv:1902.10186 [cs], February 2019. URL http://arxiv.org/abs/1902.10186. arXiv: 1902.10186.
- The Break-Even Point on Optimization Trajectories of Deep Neural Networks. arXiv:2002.09572 [cs, stat], February 2020. URL http://arxiv.org/abs/2002.09572. arXiv: 2002.09572.
- Keller Jordan. Calibrated chaos: Variance between runs of neural network training is harmless and inevitable, 2023.
- Linear connectivity reveals generalization strategies, 2023.
- Scaling Laws for Neural Language Models. arXiv:2001.08361 [cs, stat], January 2020. URL http://arxiv.org/abs/2001.08361. arXiv: 2001.08361.
- How does information bottleneck help deep learning? arXiv preprint arXiv:2305.18887, 2023.
- Similarity of neural network representations revisited. ArXiv, abs/1905.00414, 2019. URL https://api.semanticscholar.org/CorpusID:141460329.
- Selectivity considered harmful: evaluating the causal impact of class selectivity in dnns. ArXiv, abs/2003.01262, 2020.
- The large learning rate phase of deep learning: the catapult mechanism. arXiv:2003.02218 [cs, stat], March 2020. URL http://arxiv.org/abs/2003.02218. arXiv: 2003.02218.
- Datasets: A community library for natural language processing. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 175–184, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. URL https://aclanthology.org/2021.emnlp-demo.21.
- Zachary C. Lipton. The Mythos of Model Interpretability: In machine learning, the concept of interpretability is both important and slippery. Queue, 16(3):31–57, June 2018. ISSN 1542-7730, 1542-7749. doi: 10.1145/3236386.3241340. URL https://dl.acm.org/doi/10.1145/3236386.3241340.
- Probing across time: What does RoBERTa know and when? In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (eds.), Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 820–842, Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.findings-emnlp.71. URL https://aclanthology.org/2021.findings-emnlp.71.
- Towards understanding grokking: An effective theory of representation learning. Advances in Neural Information Processing Systems, 35:34651–34663, 2022.
- Omnigrok: Grokking beyond algorithmic data, 2023.
- Decoupled weight decay regularization. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=Bkg6RiCqY7.
- Evaluation beyond Task Performance: Analyzing Concepts in AlphaZero in Hex. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (eds.), Advances in Neural Information Processing Systems, volume 35, pp. 25992–26006. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/a705747417d32ebf1916169e1a442274-Paper-Conference.pdf.
- Emergent linguistic structure in artificial neural networks trained by self-supervision. Proceedings of the National Academy of Sciences, 117(48):30046–30054, 2020.
- Treebank-3, 1999. URL https://catalog.ldc.upenn.edu/LDC99T42.
- Acquisition of chess knowledge in AlphaZero. Proceedings of the National Academy of Sciences, 119(47), nov 2022. doi: 10.1073/pnas.2206625119. URL https://doi.org/10.1073%2Fpnas.2206625119.
- The inverse scaling prize, 2022a. URL https://github. com/inverse-scaling/prize, 2022.
- Locating and editing factual associations in gpt, 2023.
- A tale of two circuits: Grokking as competition of sparse and dense subnetworks, 2023.
- Insights on representational similarity in neural networks with canonical correlation, 2018.
- Spectrum of human tails: A report of six cases. Journal of Indian Association of Pediatric Surgeons, 17(1):23, 2012.
- Characterizing Intrinsic Compositionality in Transformers with Tree Projections, November 2022. URL http://arxiv.org/abs/2211.01288. arXiv:2211.01288 [cs].
- Grokking of hierarchical structure in vanilla transformers, 2023.
- SGD on Neural Networks Learns Functions of Increasing Complexity. arXiv:1905.11604 [cs, stat], May 2019. URL http://arxiv.org/abs/1905.11604. arXiv: 1905.11604.
- A Mechanistic Interpretability Analysis of Grokking, 2022. URL https://www.alignmentforum.org/posts/N6WM6hs7RQMKDhYjB/a-mechanistic-interpretability-analysis-of-grokking.
- Progress measures for grokking via mechanistic interpretability, 2023. URL https://arxiv.org/abs/2301.05217.
- Universal Dependencies. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Tutorial Abstracts, Valencia, Spain, April 2017. Association for Computational Linguistics. URL https://aclanthology.org/E17-5001.
- In-context learning and induction heads. ArXiv, abs/2209.11895, 2022.
- PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, 2019.
- Gradient Starvation: A Learning Proclivity in Neural Networks. arXiv:2011.09468 [cs, math, stat], November 2021. URL http://arxiv.org/abs/2011.09468. arXiv: 2011.09468.
- Pareto Probing: Trading Off Accuracy for Complexity. In EMNLP, October 2020. URL http://arxiv.org/abs/2010.02180. arXiv: 2010.02180.
- Steven Pinker. How the mind works. Norton, 1997.
- Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets. arXiv:2201.02177 [cs], January 2022. URL http://arxiv.org/abs/2201.02177. arXiv: 2201.02177.
- Measuring and narrowing the compositionality gap in language models, 2023.
- Svcca: Singular vector canonical correlation analysis for deep learning dynamics and interpretability, 2017.
- On the special role of class-selective neurons in early training. Transactions on Machine Learning Research, 2023. ISSN 2835-8856. URL https://openreview.net/forum?id=JaNlH6dZYk.
- Masked language model scoring. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 2699–2712, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.240. URL https://aclanthology.org/2020.acl-main.240.
- Naomi Saphra. Interpretability Creationism. The Gradient, July 2023. URL https://thegradient.pub/interpretability-creationism/.
- Understanding Learning Dynamics Of Language Models with SVCCA. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 3257–3267, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1329. URL https://www.aclweb.org/anthology/N19-1329.
- LSTMs compose—and Learn—Bottom-up. In Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 2797–2809, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.findings-emnlp.252. URL https://aclanthology.org/2020.findings-emnlp.252.
- Are Emergent Abilities of Large Language Models a Mirage?, April 2023. URL http://arxiv.org/abs/2304.15004. arXiv:2304.15004 [cs].
- Enhanced English Universal Dependencies: An improved representation for natural language understanding tasks. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pp. 2371–2378, Portorož, Slovenia, May 2016. European Language Resources Association (ELRA). URL https://aclanthology.org/L16-1376.
- The multiBERTs: BERT reproductions for robustness analysis. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=K0E_F0gFDgA.
- Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1715–1725, Berlin, Germany, August 2016. Association for Computational Linguistics. doi: 10.18653/v1/P16-1162. URL https://aclanthology.org/P16-1162.
- Is Attention Interpretable? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2931–2951, Florence, Italy, July 2019. Association for Computational Linguistics. doi: 10.18653/v1/P19-1282. URL https://aclanthology.org/P19-1282.
- The Pitfalls of Simplicity Bias in Neural Networks. arXiv:2006.07710 [cs, stat], October 2020. URL http://arxiv.org/abs/2006.07710. arXiv: 2006.07710.
- Ravid Shwartz-Ziv. Information flow in deep neural networks. arXiv preprint arXiv:2202.06749, 2022.
- Opening the Black Box of Deep Neural Networks via Information. arXiv:1703.00810 [cs], March 2017a. URL http://arxiv.org/abs/1703.00810. arXiv: 1703.00810.
- Opening the black box of deep neural networks via information. arXiv preprint arXiv:1703.00810, 2017b.
- Representation compression and generalization in deep neural networks, 2018.
- Aarohi Srivastava et al. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models, June 2022. URL http://arxiv.org/abs/2206.04615. Number: arXiv:2206.04615 arXiv:2206.04615 [cs, stat].
- Ilya Sutskever. An observation on Generalization, August 2023. URL https://www.youtube.com/watch?v=AKMuA_TVz3A.
- Richard Sutton. The bitter lesson. Incomplete Ideas (blog), 13:12, 2019.
- The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon, June 2022. URL http://arxiv.org/abs/2206.04817. Number: arXiv:2206.04817 arXiv:2206.04817 [cs, math].
- What if This Modified That? Syntactic Interventions via Counterfactual Embeddings, September 2021. URL http://arxiv.org/abs/2105.14002. arXiv:2105.14002 [cs].
- Deep learning generalizes because the parameter-function map is biased towards simple functions, April 2019. URL http://arxiv.org/abs/1805.08522. arXiv:1805.08522 [cs, stat].
- Causal mediation analysis for interpreting neural nlp: The case of gender bias, 2020. URL https://arxiv.org/abs/2004.12265.
- Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned. arXiv preprint arXiv:1905.09418, 2019.
- GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pp. 353–355, Brussels, Belgium, November 2018. Association for Computational Linguistics. doi: 10.18653/v1/W18-5446. URL https://aclanthology.org/W18-5446.
- Blimp: The benchmark of linguistic minimal pairs for english. Transactions of the Association for Computational Linguistics, 8:377–392, 2020a. doi: 10.1162/tacl_a_00321. URL https://doi.org/10.1162/tacl_a_00321.
- Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually). In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 217–235, Online, November 2020b. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/2020.emnlp-main.16.
- Emergent Abilities of Large Language Models, June 2022. URL http://arxiv.org/abs/2206.07682. Number: arXiv:2206.07682 arXiv:2206.07682 [cs].
- Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45, Online, October 2020. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/2020.emnlp-demos.6.
- When Do Curricula Work? arXiv:2012.03107 [cs, eess, stat], February 2021. URL http://arxiv.org/abs/2012.03107. arXiv: 2012.03107.
- Training trajectories of language models across scales, 2023.
- Ed Yong. We’re the Only Animals With Chins, and No One Knows Why, January 2016. URL https://www.theatlantic.com/science/archive/2016/01/were-the-only-animals-with-chins-and-no-one-knows-why/431625/. Section: Science.
- Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In The IEEE International Conference on Computer Vision (ICCV), December 2015.