Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Increasing The Performance of Cognitively Inspired Data-Efficient Language Models via Implicit Structure Building (2310.20589v1)

Published 31 Oct 2023 in cs.CL

Abstract: In this paper, we describe our submission to the BabyLM Challenge 2023 shared task on data-efficient LLM (LM) pretraining (Warstadt et al., 2023). We train transformer-based masked LLMs that incorporate unsupervised predictions about hierarchical sentence structure into the model architecture. Concretely, we use the Structformer architecture (Shen et al., 2021) and variants thereof. StructFormer models have been shown to perform well on unsupervised syntactic induction based on limited pretraining data, and to yield performance improvements over a vanilla transformer architecture (Shen et al., 2021). Evaluation of our models on 39 tasks provided by the BabyLM challenge shows promising improvements of models that integrate a hierarchical bias into the architecture at some particular tasks, even though they fail to consistently outperform the RoBERTa baseline model provided by the shared task organizers on all tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Omar Momen (3 papers)
  2. David Arps (5 papers)
  3. Laura Kallmeyer (9 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.