Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Word Order and World Knowledge (2403.00876v1)

Published 1 Mar 2024 in cs.CL and cs.AI

Abstract: Word order is an important concept in natural language, and in this work, we study how word order affects the induction of world knowledge from raw text using LLMs. We use word analogies to probe for such knowledge. Specifically, in addition to the natural word order, we first respectively extract texts of six fixed word orders from five languages and then pretrain the LLMs on these texts. Finally, we analyze the experimental results of the fixed word orders on word analogies and show that i) certain fixed word orders consistently outperform or underperform others, though the specifics vary across languages, and ii) the Wov2Lex hypothesis is not hold in pre-trained LLMs, and the natural word order typically yields mediocre results. The source code will be made publicly available at https://github.com/lshowway/probing_by_analogy.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)
  1. Word order does matter and shuffled language models know it. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6907–6919, Dublin, Ireland. Association for Computational Linguistics.
  2. Exemplar variability facilitates retention of word learning by children with specific language impairment. Language, Speech, and Hearing Services in Schools, 49(1):72–84.
  3. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5:135–146.
  4. On the role of lexical and world knowledge in RTE3. In Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, pages 54–59, Prague. Association for Computational Linguistics.
  5. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  6. Analogy training multilingual encoders. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 12884–12892.
  7. Murray Gell-Mann and Merritt Ruhlen. 2011. The origin and evolution of word order. Proceedings of the National Academy of Sciences, 108(42):17290–17295.
  8. Michael Hahn and Yang Xu. 2022. Crosslinguistic word order variation reflects evolutionary pressures of dependency and information locality. Proceedings of the National Academy of Sciences, 119(24):e2122604119.
  9. Jack Hessel and Alexandra Schofield. 2021. How effective is BERT without word ordering? implications for language understanding and data privacy. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 204–211, Online. Association for Computational Linguistics.
  10. Natalia Levshina. 2019. Token-based typology and word order entropy: A study based on universal dependencies. Linguistic Typology, 23(3):533–572.
  11. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
  12. Gary Lupyan and Morten Christiansen. 2002. Case, word order, and language learnability. In Proceedings of the 24th Annual Conference of the Cognitive Science Society. Lawrence Erlbaum, New Jersey.
  13. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
  14. How variability shapes learning and generalization. Trends in Cognitive Sciences, 26.
  15. Masked language modeling and the distributional hypothesis: Order word matters pre-training for little. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 2888–2913, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  16. The role of input variability and learner age in second language vocabulary learning. Studies in Second Language Acquisition, 41:1–26.
  17. Natalia Slioussar. 2011. Processing of a free word order language: the role of syntax and context. Journal of Psycholinguistic Research, 40:291–306.
  18. Multilingual culture-independent word analogy datasets. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4074–4080, Marseille, France. European Language Resources Association.
  19. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 353–355, Brussels, Belgium. Association for Computational Linguistics.

Summary

We haven't generated a summary for this paper yet.