Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Less is More: Pre-Training Cross-Lingual Small-Scale Language Models with Cognitively-Plausible Curriculum Learning Strategies (2410.22886v1)

Published 30 Oct 2024 in cs.CL and cs.AI

Abstract: Curriculum Learning has been a popular strategy to improve the cognitive plausibility of Small-Scale LLMs (SSLMs) in the BabyLM Challenge. However, it has not led to considerable improvements over non-curriculum models. We assess whether theoretical linguistic acquisition theories can be used to specify more fine-grained curriculum learning strategies, creating age-ordered corpora of Child-Directed Speech for four typologically distant language families to implement SSLMs and acquisition-inspired curricula cross-lingually. Comparing the success of three objective curricula (Growing, Inwards and MMM) that precisely replicate the predictions of acquisition theories on a standard SSLM architecture, we find fine-grained acquisition-inspired curricula can outperform non-curriculum baselines and performance benefits of curricula strategies in SSLMs can be derived by specifying fine-grained language-specific curricula that precisely replicate language acquisition theories.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Theresa Biberauer. 2019. Children always go beyond the input: The Maximise Minimal Means perspective. Theoretical Linguistics, 45(3-4):211–224.
  2. Theresa Biberauer and Ian Roberts. 2015. Rethinking Formal Hierarchies: A Proposed Unification. Cambridge Occasional Papers in Linguistics, 7:1–31.
  3. Semantic tagging with deep residual networks. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 3531–3541, Osaka, Japan. The COLING 2016 Organizing Committee.
  4. Nasim Borazjanizadeh. 2023. Optimizing GPT-2 pretraining on BabyLM corpus with difficulty-based sentence reordering. In Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning, pages 356–365, Singapore. Association for Computational Linguistics.
  5. Núria Bosch. 2024. On another topic, how do acquisition orders vary? The left periphery and topicalisation in bilinguals and monolinguals. 1st year PhD report.
  6. Núria Bosch. 2023. Emergent Syntax and Maturation: A Neo-Emergentist Approach to Development. MPhil Thesis, Department of Theoretical and Applied Linguistics, University of Cambridge.
  7. Paula J. Buttery. 2006. Computational models for first language acquisition. Technical Report UCAM-CL-TR-675, University of Cambridge, Computer Laboratory.
  8. Can training neural language models on a curriculum with developmentally plausible data improve alignment with human reading behavior? In Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning, pages 98–111, Singapore. Association for Computational Linguistics.
  9. Revisiting pre-trained models for Chinese natural language processing. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 657–668, Online. Association for Computational Linguistics.
  10. Justin DeBenedetto. 2023. Byte-ranked curriculum learning for BabyLM strict-small shared task 2023. In Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning, pages 198–206, Singapore. Association for Computational Linguistics.
  11. CLIMB – Curriculum Learning for Infant-inspired Model Building. In Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning, pages 112–127, Singapore. Association for Computational Linguistics.
  12. Lukas Edman and Lisa Bylinina. 2023. Too much information: Keeping training simple for BabyLMs. In Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning, pages 89–97, Singapore. Association for Computational Linguistics.
  13. Growing trees: The acquisition of the left periphery. Glossa: a journal of general linguistics, 6(1):131.
  14. Jutta Heim and Martina Wiltschko. 2021. Acquiring the form and function of interaction: a comparison of the acquisition of sentence-final particles and tag questions in the brown corpus. Talk presented at LAGB Annual Meeting 2021 (online), 8 September.
  15. The weirdest people in the world? Behavioral and Brain Sciences, 33(2-3):61–83.
  16. BabyBERTa: Learning more grammar with small-scale child-directed language. In Proceedings of the 25th Conference on Computational Natural Language Learning, pages 624–646, Online. Association for Computational Linguistics.
  17. Philip A Huebner and Jon A Willits. 2021. Using lexical context to discover the noun category: Younger children have it easier. In Psychology of learning and motivation, volume 75, pages 279–331. Elsevier.
  18. Universal semantic tagging for English and Mandarin Chinese. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5554–5566, Online. Association for Computational Linguistics.
  19. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv preprint arXiv:1907.11692.
  20. Fixing weight decay regularization in adam. arXiv preprint arXiv:1711.05101, 5.
  21. Brian MacWhinney. 2000. The CHILDES Project: The Database, volume 2. Psychology Press.
  22. Building a Large Annotated Corpus of English: The Penn Treebank. Comput. Linguist., 19(2):313–330.
  23. Cross-linguistic syntactic evaluation of word prediction models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5523–5539, Online. Association for Computational Linguistics.
  24. BabyLM challenge: Curriculum learning based on sentence complexity approximating language acquisition. In Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning, pages 290–297, Singapore. Association for Computational Linguistics.
  25. On the effect of curriculum learning with developmental data for grammar acquisition. In Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning, pages 346–355, Singapore. Association for Computational Linguistics.
  26. Andrew Radford. 1990. The Syntax of Nominal Arguments in Early Child English. Language Acquisition, 1(3):195–223.
  27. Luigi Rizzi. 1993. Some Notes on Linguistic Theory and Language Development: The case of root infinitives. Language Acquisition, 3(4):371–393.
  28. Suchir A. Salhan. 2023. On the potential for ‘Maximising Minimal Means’ in Transformer Language Models: A Dynamical Systems Theory Perspective. Cambridge Occasional Papers in Linguistics, page 55–110.
  29. Taiga Someya and Yohei Oseki. 2023. JBLiMP: Japanese benchmark of linguistic minimal pairs. In Findings of the Association for Computational Linguistics: EACL 2023, pages 1581–1594, Dubrovnik, Croatia. Association for Computational Linguistics.
  30. SLING: Sino linguistic evaluation of large language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 4606–4634, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  31. Call for papers – The BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus.
  32. BLiMP: The benchmark of linguistic minimal pairs for English. Transactions of the Association for Computational Linguistics, 8:377–392.
  33. Should you mask 15% in masked language modeling? In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2985–3000, Dubrovnik, Croatia. Association for Computational Linguistics.
  34. CLiMP: A benchmark for Chinese language model evaluation. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 2784–2790, Online. Association for Computational Linguistics.
  35. The Penn Chinese Treebank: Phrase structure annotation of a large corpus. Natural language engineering, 11(2):207–238.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Suchir Salhan (2 papers)
  2. Richard Diehl Martinez (13 papers)
  3. Paula Buttery (15 papers)
  4. Zébulon Goriely (5 papers)

Summary

We haven't generated a summary for this paper yet.