Developmentally-plausible Working Memory Shapes a Critical Period for Language Acquisition

Published 7 Feb 2025 in cs.CL | (2502.04795v3)

Abstract: LLMs possess general linguistic abilities but acquire language less efficiently than humans. This study proposes a method for integrating the developmental characteristics of working memory during the critical period, a stage when human language acquisition is particularly efficient, into the training process of LLMs. The proposed method introduces a mechanism that initially constrains working memory during the early stages of training and gradually relaxes this constraint in an exponential manner as learning progresses. Targeted syntactic evaluation shows that the proposed method outperforms conventional methods without memory constraints or with static memory constraints. These findings not only provide new directions for designing data-efficient LLMs but also offer indirect evidence supporting the role of the developmental characteristics of working memory as the underlying mechanism of the critical period in language acquisition.

Abstract PDF Upgrade to Chat

Summary

The paper introduces DynamicLimit-Exp, a novel method modeling human working memory growth to enhance syntactic learning in language models.
It employs dynamic constraints via ALiBi to simulate critical language acquisition, evaluated using AO-CHILDES and Wikipedia data.
Experimental results validate the Less-is-More hypothesis, demonstrating improved grammatical learning and suggesting future multilingual applications.

Developmentally-plausible Working Memory Shapes a Critical Period for Language Acquisition

Introduction

The paper entitled "Developmentally-plausible Working Memory Shapes a Critical Period for Language Acquisition" (2502.04795) explores integrating the developmental characteristics of human working memory into LLMs to enhance their language acquisition efficiency. The research proposes a novel mechanism inspired by the Critical Period Hypothesis, particularly emphasizing the exponential growth of working memory during early language learning stages, and evaluates its impact through targeted syntactic benchmarks.

Human language acquisition is theorized to occur most efficiently during a biological "critical period," supported by psycholinguistic and neurobiological evidence [lenneberg1967biological]. The Less-is-More Hypothesis further suggests that cognitive limitations in early childhood, such as restricted working memory capacity, facilitate the extraction of fundamental language patterns, providing a learning advantage over adults with more developed cognitive faculties [NEWPORT199011].

Computational models have become indispensable in testing such hypotheses, offering controlled environments to simulate language learning processes. However, traditional models like GPT-2 often fail to naturally exhibit critical period phenomena, necessitating the incorporation of tailored mechanisms such as Elastic Weight Consolidation to artificially induce these effects [constantinescu2024investigatingcriticalperiodeffects].

Proposed Method: Dynamic Growing of Working Memory

The proposed method, DynamicLimit-Exp, modifies neural LLMs by introducing dynamic constraints on working memory capacity that evolve throughout the training process, mirroring the developmental trajectory observed in human cognition (Figure 1). Using Attention with Linear Biases (ALiBi) as a foundational technique, this approach penalizes attention scores for distant query-key pairs, initially enforcing a recency bias that gradually diminishes over time, allowing the model to expand its contextual processing capabilities.

Experimental Setup and Results

Experiments were conducted using a modified GPT-2 architecture trained on Child-Directed Speech (AO-CHILDES) and Wikipedia datasets to evaluate the efficacy of the developmental-inspired learning schedule across different linguistic stimuli [aochildes2021, huebner-etal-2021-babyberta]. Targeted syntactic evaluations using the Zorro benchmark revealed that DynamicLimit-Exp models consistently outperformed static and unconstrained baselines, demonstrating enhanced efficiency in grammatical learning tasks (Table 1).

Performance improvements were particularly evident in LLMs trained with AO-CHILDES data, emphasizing that the gradual escalation of working memory constraints effectively facilitated pattern extraction irrespective of the linguistic input type (Table 2). This indicates that improved learning efficiency derives not solely from exposure to child-centric language but from the model's intrinsic developmental tuning capabilities.

Analysis of Cognitive Constraints

To substantiate the impact of developmental working memory characteristics, a cognitively implausible baseline was introduced, reversing the direction of memory growth constraints. The analysis confirmed that DynamicLimit-Exp's success is rooted in the Less-is-More hypothesis rather than input variability [NEWPORT199011], with the gradual expansion proving vital for extracting and generalizing complex grammatical rules.

Further evaluation of feature extraction capabilities revealed sustained improvements in representation learning across training epochs. DynamicLimit-Exp maintained diverse embedding spaces with high entropy and isotropy, crucial for facilitating contextually adaptive learning (Figure 2, Table 3).

Implications and Future Directions

The incorporation of developmentally plausible cognitive constraints into LLMs offers new avenues for enhancing data efficiency and understanding language acquisition mechanisms. Future research should explore scalability to larger models and multilingual applications, leveraging benchmarks like Zorro across diverse linguistic contexts.

The proposed cognitive-inspired approach may also complement existing techniques that target syntactic smoothing and frequency bias reduction in pretraining models, potentially augmenting generalization capabilities [diehl-martinez-etal-2024-mitigating].

Conclusion

This study advances our understanding of language acquisition by integrating a developmental growth curve of human working memory into LLMs, improving efficiency in syntactic learning tasks. These findings suggest that adopting cognitive developmental patterns can enrich model architectures, providing more efficient learning strategies and insights into the unique attributes of human language processing.