Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CLIMB: Curriculum Learning for Infant-inspired Model Building (2311.08886v1)

Published 15 Nov 2023 in cs.CL

Abstract: We describe our team's contribution to the STRICT-SMALL track of the BabyLM Challenge. The challenge requires training a LLM from scratch using only a relatively small training dataset of ten million words. We experiment with three variants of cognitively-motivated curriculum learning and analyze their effect on the performance of the model on linguistic evaluation tasks. In the vocabulary curriculum, we analyze methods for constraining the vocabulary in the early stages of training to simulate cognitively more plausible learning curves. In the data curriculum experiments, we vary the order of the training instances based on i) infant-inspired expectations and ii) the learning behavior of the model. In the objective curriculum, we explore different variations of combining the conventional masked LLMing task with a more coarse-grained word class prediction task to reinforce linguistic generalization capabilities. Our results did not yield consistent improvements over our own non-curriculum learning baseline across a range of linguistic benchmarks; however, we do find marginal gains on select tasks. Our analysis highlights key takeaways for specific combinations of tasks and settings which benefit from our proposed curricula. We moreover determine that careful selection of model architecture, and training hyper-parameters yield substantial improvements over the default baselines provided by the BabyLM challenge.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Richard Diehl Martinez (13 papers)
  2. Zebulon Goriely (3 papers)
  3. Hope McGovern (5 papers)
  4. Christopher Davis (27 papers)
  5. Andrew Caines (13 papers)
  6. Paula Buttery (15 papers)
  7. Lisa Beinborn (17 papers)
Citations (6)