Unveiling Linguistic Regions in Large Language Models (2402.14700v3)
Abstract: LLMs have demonstrated considerable cross-lingual alignment and generalization ability. Current research primarily focuses on improving LLMs' cross-lingual generalization capabilities. However, there is still a lack of research on the intrinsic mechanisms of how LLMs achieve cross-lingual alignment. From the perspective of region partitioning, this paper conducts several investigations on the linguistic competence of LLMs. We discover a core region in LLMs that corresponds to linguistic competence, accounting for approximately 1% of the total model parameters. Removing this core region by setting parameters to zero results in a significant performance decrease across 30 different languages. Furthermore, this core region exhibits significant dimensional dependence, perturbations to even a single parameter on specific dimensions leading to a loss of linguistic competence. Moreover, we discover that distinct monolingual regions exist for different languages, and disruption to these specific regions substantially reduces the LLMs' proficiency in those corresponding languages. Our research also indicates that freezing the core linguistic region during further pre-training can mitigate the issue of catastrophic forgetting (CF), a common phenomenon observed during further pre-training of LLMs. Overall, exploring the LLMs' functional regions provides insights into the foundation of their intelligence.
- Palm 2 technical report. CoRR, abs/2305.10403.
- On the cross-lingual transferability of monolingual representations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, pages 4623–4637. Association for Computational Linguistics.
- Nusacrowd: Open source initiative for indonesian NLP resources. In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023, pages 13745–13818. Association for Computational Linguistics.
- Multilingual alignment of contextual word representations. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net.
- Finding universal grammatical relations in multilingual BERT. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, pages 5564–5577. Association for Computational Linguistics.
- Cross-lingual transfer with language-specific subnetworks for low-resource dependency parsing. Comput. Linguistics, 49(3):613–641.
- Training verifiers to solve math word problems. CoRR, abs/2110.14168.
- Emerging cross-lingual structure in pretrained language models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, pages 6022–6034. Association for Computational Linguistics.
- BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pages 4171–4186. Association for Computational Linguistics.
- A survey on in-context learning.
- An empirical investigation of catastrophic forgeting in gradient-based neural networks. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings.
- What does BERT learn about the structure of language? In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, pages 3651–3657. Association for Computational Linguistics.
- Measuring catastrophic forgetting in neural networks. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pages 3390–3398. AAAI Press.
- Gluecos: An evaluation benchmark for code-switched NLP. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, pages 3575–3585. Association for Computational Linguistics.
- Super tickets in pre-trained language models: From model compression to improving generalization. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021, pages 6524–6538. Association for Computational Linguistics.
- Open sesame: Getting inside bert’s linguistic knowledge. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, BlackboxNLP@ACL 2019, Florence, Italy, August 1, 2019, pages 241–253. Association for Computational Linguistics.
- Michael McCloskey and Neal J. Cohen. 1989. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem, page 109–165. Elsevier.
- Importance estimation for neural network pruning. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 11264–11272. Computer Vision Foundation / IEEE.
- Crosslingual generalization through multitask finetuning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pages 15991–16111. Association for Computational Linguistics.
- OpenAI. 2023. GPT-4 technical report. CoRR, abs/2303.08774.
- How multilingual is multilingual bert? In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, pages 4996–5001. Association for Computational Linguistics.
- Comparing rewinding and fine-tuning in neural network pruning. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net.
- Anthony V. Robins. 1995. Catastrophic forgetting, rehearsal and pseudorehearsal. Connect. Sci., 7(2):123–146.
- Movement pruning: Adaptive sparsity by fine-tuning. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
- Suman Sapkota and Binod Bhattarai. 2023. Importance estimation with random gradient for neural network pruning. CoRR, abs/2310.20203.
- Multilingual instruction tuning with just a pinch of multilinguality. CoRR, abs/2401.01854.
- Tom Sherborne and Mirella Lapata. 2022. Zero-shot cross-lingual semantic parsing. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, pages 4134–4153. Association for Computational Linguistics.
- Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res., 15(1):1929–1958.
- Multilingual llms are better cross-lingual in-context learners with alignment. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pages 6292–6307. Association for Computational Linguistics.
- Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288.
- Udapter: Language adaptation for truly universal dependency parsing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, pages 2302–2315. Association for Computational Linguistics.
- Language models are few-shot multilingual learners. CoRR, abs/2109.07684.
- Shijie Wu and Mark Dredze. 2019. Beto, bentz, becas: The surprising cross-lingual effectiveness of BERT. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pages 833–844. Association for Computational Linguistics.
- Cross-linguistic syntactic difference in multilingual BERT: how good is it and how does it affect transfer? In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 8073–8092. Association for Computational Linguistics.
- Are structural concepts universal in transformer language models? towards interpretable cross-lingual generalization. In Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023, pages 13951–13976. Association for Computational Linguistics.
- Language versatilists vs. specialists: An empirical revisiting on multilingual transfer ability. CoRR, abs/2306.06688.
- Prune once for all: Sparse pre-trained language models. CoRR, abs/2111.05754.
- PLATON: pruning large transformer models with upper confidence bound of weight importance. In International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pages 26809–26823. PMLR.
- Llama beyond english: An empirical study on language capability transfer. CoRR, abs/2401.01055.
- Michael Zhu and Suyog Gupta. 2018. To prune, or not to prune: Exploring the efficacy of pruning for model compression. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Workshop Track Proceedings. OpenReview.net.
- Zhihao Zhang (61 papers)
- Jun Zhao (469 papers)
- Qi Zhang (784 papers)
- Tao Gui (127 papers)
- Xuanjing Huang (287 papers)