Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Are Structural Concepts Universal in Transformer Language Models? Towards Interpretable Cross-Lingual Generalization (2310.12794v2)

Published 19 Oct 2023 in cs.CL

Abstract: LLMs have exhibited considerable cross-lingual generalization abilities, whereby they implicitly transfer knowledge across languages. However, the transfer is not equally successful for all languages, especially for low-resource ones, which poses an ongoing challenge. It is unclear whether we have reached the limits of implicit cross-lingual generalization and if explicit knowledge transfer is viable. In this paper, we investigate the potential for explicitly aligning conceptual correspondence between languages to enhance cross-lingual generalization. Using the syntactic aspect of language as a testbed, our analyses of 43 languages reveal a high degree of alignability among the spaces of structural concepts within each language for both encoder-only and decoder-only LLMs. We then propose a meta-learning-based method to learn to align conceptual spaces of different languages, which facilitates zero-shot and few-shot generalization in concept classification and also offers insights into the cross-lingual in-context learning phenomenon. Experiments on syntactic analysis tasks show that our approach achieves competitive results with state-of-the-art methods and narrows the performance gap between languages, particularly benefiting those with limited resources.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (83)
  1. System alignment supports cross-domain learning and zero-shot generalisation. Cognition, 227:105200.
  2. Mega: Multilingual evaluation of generative ai. arXiv preprint arXiv:2303.12528.
  3. Prompting as probing: Using language models for knowledge base construction. arXiv preprint arXiv:2208.11057.
  4. Many languages, one parser. Transactions of the Association for Computational Linguistics, 4:431–444.
  5. Johannes Bjerva and Isabelle Augenstein. 2021. Does typological blinding impede cross-lingual sharing? In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 480–486, Online. Association for Computational Linguistics.
  6. Does manipulating tokenization aid cross-lingual transfer? a study on pos tagging for non-standardized languages. arXiv preprint arXiv:2304.10158.
  7. Systematic inequalities in language technology performance across the world’s languages. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5486–5505, Dublin, Ireland. Association for Computational Linguistics.
  8. Prompting language models for linguistic structure. arXiv preprint arXiv:2211.07830.
  9. Deep RNNs encode soft hierarchical syntax. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 14–19, Melbourne, Australia. Association for Computational Linguistics.
  10. Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
  11. Multilingual alignment of contextual word representations. In International Conference on Learning Representations.
  12. On the cross-lingual transferability of multilingual prototypical models across NLU tasks. In Proceedings of the 1st Workshop on Meta Learning and Its Applications to Natural Language Processing, pages 36–43, Online. Association for Computational Linguistics.
  13. Finding universal grammatical relations in multilingual BERT. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5564–5577, Online. Association for Computational Linguistics.
  14. Cross-Lingual Transfer with Language-Specific Subnetworks for Low-Resource Dependency Parsing. Computational Linguistics, pages 1–29.
  15. What does BERT look at? an analysis of BERT’s attention. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 276–286, Florence, Italy. Association for Computational Linguistics.
  16. What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2126–2136, Melbourne, Australia. Association for Computational Linguistics.
  17. Emerging cross-lingual structure in pretrained language models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6022–6034, Online. Association for Computational Linguistics.
  18. William Croft. 1991. Syntactic categories and grammatical relations: The cognitive organization of information. University of Chicago Press.
  19. Zero-shot dependency parsing with worst-case aware automated curriculum learning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 578–587, Dublin, Ireland. Association for Computational Linguistics.
  20. Universal Dependencies. Computational Linguistics, 47(2):255–308.
  21. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  22. Probing for incremental parse states in autoregressive language models. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 2801–2813, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  23. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 1126–1135. PMLR.
  24. How to (properly) evaluate cross-lingual word embeddings: On strong baselines, comparative analyses, and some misconceptions. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 710–721, Florence, Italy. Association for Computational Linguistics.
  25. Goran Glavaš and Ivan Vulić. 2021. Climbing the tower of treebanks: Improving low-resource dependency parsing via hierarchical source selection. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 4878–4888, Online. Association for Computational Linguistics.
  26. glottolog/glottolog: Glottolog database 4.7.
  27. Martin Haspelmath. 2010. Comparative concepts and descriptive categories in crosslinguistic studies. Language, 86(3):663–687.
  28. Martin Haspelmath. 2021. General linguistics must be based on universals (or non-conventional aspects of language). Theoretical Linguistics, 47(1-2):1–31.
  29. John Hewitt and Christopher D. Manning. 2019. A structural probe for finding syntax in word representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4129–4138, Minneapolis, Minnesota. Association for Computational Linguistics.
  30. Meta-learning in neural networks: A survey. IEEE Transactions on Pattern Analysis & Machine Intelligence, 44(09):5149–5169.
  31. A survey of deep meta-learning. Artificial Intelligence Review, 54(6):4483–4541.
  32. The state and fate of linguistic diversity and inclusion in the NLP world. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6282–6293, Online. Association for Computational Linguistics.
  33. Nikolaus Kriegeskorte and Jörn Diedrichsen. 2019. Peeling the onion of brain representations. Annual Review of Neuroscience, 42(1):407–432. PMID: 31283895.
  34. Representational similarity analysis - connecting the branches of systems neuroscience. Frontiers in Systems Neuroscience, 2.
  35. Chatgpt beyond english: Towards a comprehensive evaluation of large language models in multilingual learning. CoRR, abs/2304.05613.
  36. Word translation without parallel data. In International Conference on Learning Representations.
  37. Meta-learning for fast cross-lingual adaptation in dependency parsing. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8503–8520, Dublin, Ireland. Association for Computational Linguistics.
  38. From zero to hero: On the limitations of zero-shot language transfer with multilingual Transformers. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4483–4499, Online. Association for Computational Linguistics.
  39. Probing via prompting. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1144–1157, Seattle, United States. Association for Computational Linguistics.
  40. Cross-domain few-shot learning with task-specific adapters. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7161–7170.
  41. Choosing transfer languages for cross-lingual learning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3125–3135, Florence, Italy. Association for Computational Linguistics.
  42. Tal Linzen and Marco Baroni. 2021. Syntactic structure from deep learning. Annual Review of Linguistics, 7(1):195–212.
  43. URIEL and lang2vec: Representing languages as typological, geographical, and phylogenetic vectors. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 8–14, Valencia, Spain. Association for Computational Linguistics.
  44. Linguistic knowledge and transferability of contextual representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1073–1094, Minneapolis, Minnesota. Association for Computational Linguistics.
  45. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8086–8098, Dublin, Ireland. Association for Computational Linguistics.
  46. Linguistically guided multilingual nlp: Current approaches, challenges, and future perspectives. Algebraic Structures in Natural Language, pages 163–188.
  47. Emergent linguistic structure in artificial neural networks trained by self-supervision. Proceedings of the National Academy of Sciences, 117(48):30046–30054.
  48. Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168.
  49. Rethinking the role of demonstrations: What makes in-context learning work? In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11048–11064, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  50. Reframing instructional prompts to GPTk’s language. In Findings of the Association for Computational Linguistics: ACL 2022, pages 589–612, Dublin, Ireland. Association for Computational Linguistics.
  51. First align, then predict: Understanding the cross-lingual ability of multilingual BERT. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 2214–2231, Online. Association for Computational Linguistics.
  52. On first-order meta-learning algorithms. CoRR, abs/1803.02999.
  53. Zero-shot cross-lingual transfer with meta learning. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4547–4562, Online. Association for Computational Linguistics.
  54. In-context learning and induction heads. arXiv preprint arXiv:2209.11895.
  55. Deep subjecthood: Higher-order grammatical features in multilingual BERT. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 2522–2532, Online. Association for Computational Linguistics.
  56. How multilingual is multilingual BERT? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4996–5001, Florence, Italy. Association for Computational Linguistics.
  57. Minimax and neyman–Pearson meta-learning for outlier languages. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 1245–1260, Online. Association for Computational Linguistics.
  58. Isomorphic transfer of syntactic structures in cross-lingual NLP. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1531–1542, Melbourne, Australia. Association for Computational Linguistics.
  59. Romanization-based large-scale adaptation of multilingual language models. arXiv preprint arXiv:2304.08865.
  60. Brett D. Roads and Bradley C. Love. 2020. Learning as the unsupervised alignment of conceptual systems. Nature Machine Intelligence, 2(1):76–82.
  61. A survey of cross-lingual word embedding models. J. Artif. Int. Res., 65(1):569–630.
  62. How good is your tokenizer? on the monolingual performance of multilingual language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3118–3135, Online. Association for Computational Linguistics.
  63. Cross-lingual alignment of contextual word embeddings, with applications to zero-shot dependency parsing. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1599–1613, Minneapolis, Minnesota. Association for Computational Linguistics.
  64. Tom Sherborne and Mirella Lapata. 2022. Zero-shot cross-lingual semantic parsing. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4134–4153, Dublin, Ireland. Association for Computational Linguistics.
  65. Tom Sherborne and Mirella Lapata. 2023. Meta-Learning a Cross-lingual Manifold for Semantic Parsing. Transactions of the Association for Computational Linguistics, 11:49–67.
  66. BERT is not an interlingua and the bias of tokenization. In Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019), pages 47–55, Hong Kong, China. Association for Computational Linguistics.
  67. Prototypical networks for few-shot learning. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
  68. Multilingual llms are better cross-lingual in-context learners with alignment. CoRR, abs/2305.05940.
  69. What do you learn from context? probing for sentence structure in contextualized word representations. In International Conference on Learning Representations.
  70. Cultural influences on word meanings revealed through large-scale semantic alignment. Nature Human Behaviour, 4(10):1029–1038.
  71. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  72. UDapter: Language adaptation for truly Universal Dependency parsing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2302–2315, Online. Association for Computational Linguistics.
  73. UDapter: Typology-based language adapters for multilingual dependency parsing and sequence labeling. Computational Linguistics, 48(3):555–592.
  74. Are all good word vector spaces isomorphic? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3178–3192, Online. Association for Computational Linguistics.
  75. Generalizing from a few examples: A survey on few-shot learning. ACM Comput. Surv., 53(3).
  76. Cross-lingual few-shot learning on unseen languages. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 777–791, Online only. Association for Computational Linguistics.
  77. Language models are few-shot multilingual learners. In Proceedings of the 1st Workshop on Multilingual Representation Learning, pages 1–15, Punta Cana, Dominican Republic. Association for Computational Linguistics.
  78. Shijie Wu and Mark Dredze. 2019. Beto, bentz, becas: The surprising cross-lingual effectiveness of BERT. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 833–844, Hong Kong, China. Association for Computational Linguistics.
  79. Shijie Wu and Mark Dredze. 2020. Do explicit alignments robustly improve multilingual encoders? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4471–4482, Online. Association for Computational Linguistics.
  80. Cross-linguistic syntactic difference in multilingual BERT: How good is it and how does it affect transfer? In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 8073–8092, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  81. Conceptual relations predict colexification across languages. Cognition, 201:104280.
  82. On the universal structure of human lexical semantics. Proceedings of the National Academy of Sciences, 113(7):1766–1771.
  83. Calibrate before use: Improving few-shot performance of language models. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 12697–12706. PMLR.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Ningyu Xu (4 papers)
  2. Qi Zhang (784 papers)
  3. Jingting Ye (3 papers)
  4. Menghan Zhang (7 papers)
  5. Xuanjing Huang (287 papers)
Citations (4)