ToddlerBERTa: Exploiting BabyBERTa for Grammar Learning and Language Understanding

Published 30 Aug 2023 in cs.CL and cs.LG | (2308.16336v2)

Abstract: We present ToddlerBERTa, a BabyBERTa-like LLM, exploring its capabilities through five different models with varied hyperparameters. Evaluating on BLiMP, SuperGLUE, MSGS, and a Supplement benchmark from the BabyLM challenge, we find that smaller models can excel in specific tasks, while larger models perform well with substantial data. Despite training on a smaller dataset, ToddlerBERTa demonstrates commendable performance, rivalling the state-of-the-art RoBERTa-base. The model showcases robust language understanding, even with single-sentence pretraining, and competes with baselines that leverage broader contextual information. Our work provides insights into hyperparameter choices, and data utilization, contributing to the advancement of LLMs.