Language models learn rare syntactic phenomena, but it has been argued that they rely on rote memorization, as opposed to grammatical generalization. Training on a corpus of human-scale in size (100M words), we iteratively trained transformer language models on systematically manipulated corpora and then evaluated their learning of a particular rare grammatical phenomenon: the English Article+Adjective+Numeral+Noun (AANN) construction (a beautiful five days''). We first compared how well this construction was learned on the default corpus relative to a counterfactual corpus in which the AANN sentences were removed. AANNs were still learned better than systematically perturbed variants of the construction. Using additional counterfactual corpora, we suggest that this learning occurs through generalization from related constructions (e.g.,
a few days''). An additional experiment showed that this learning is enhanced when there is more variability in the input. Taken together, our results provide an existence proof that models learn rare grammatical phenomena by generalization from less rare phenomena. Code available at https://github.com/kanishkamisra/aannalysis
The study investigates transformer-based language models' ability to learn the rare AANN (Article+Adjective+Numeral+Noun) construction using a 100 million word corpus.
Models could generalize the AANN construction from related, more common constructions, despite reduced performance when AANN instances were removed from training data.
Variability in training data and exposure to a broad range of AANN instances enhanced the models’ learning and generalization capabilities.
Findings suggest language models’ learning of grammatical constructions is based on statistical learning rather than memorization, with implications for both machine learning and linguistic theory.
Recent developments in the field of computational linguistics have highlighted the capabilities of language models to learn and generalize from linguistic input. This article discusses a study that investigates the ability of transformer-based language models to learn a specific rare grammatical phenomenon, the English Article+Adjective+Numeral+Noun (AANN) construction, through systematic manipulation of the training data.
The core of the study involves training language models on a corpus that approximates a human-scale linguistic input (100 million words), with and without exposure to instances of the AANN construction. The training was followed by evaluating the models' performance on AANN as well as on purposefully perturbed variants of the construction, to assess the generality of the learning. The findings lend credence to the hypothesis that models can abstract grammatical principles from related, more common constructions, thereby demonstrating an ability to generalize beyond direct experience.
The study presents compelling evidence that language models, even when trained on data of a scale comparable to that encountered by human learners, can generalize and learn rare grammatical constructions. This has significant implications:
In future work, extending this approach to a broader set of rare constructions could provide deeper insights into the learning capacities of language models and the linguistic principles that underpin language acquisition. Moreover, studies that bridge the gap between grammatical form learning and semantic understanding could offer a more holistic view of language comprehension in machine learning models.
In conclusion, this research contributes to the ongoing exploration of how language models learn and generalize, demonstrating that with the right exposure and data manipulation, even rare syntactic phenomena are within the grasp of current language technologies.