Human-scale data sufficiency for emergent compositional abilities
Ascertain whether in-context compositional generalization capabilities observed in large language models pretrained on internet-scale corpora can emerge when models are trained on developmentally plausible, human-lifetime amounts of data.
References
However, current LLMs are trained on orders of magnitude more data than humans experience in an entire lifetime, making it unclear whether similar capabilities could emerge in models trained on a more realistic scale.
— From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks
(2405.15164 - Russin et al., 24 May 2024) in Section 6.2, Implications for Human Development