Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

169 tokens/sec

GPT-4o

7 tokens/sec

Gemini 2.5 Pro Pro

45 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

218 2

Language Models Learn Rare Phenomena from Less Rare Phenomena: The Case of the Missing AANNs (2403.19827v3)

Published 28 Mar 2024 in cs.CL

Abstract: LLMs learn rare syntactic phenomena, but the extent to which this is attributable to generalization vs. memorization is a major open question. To that end, we iteratively trained transformer LLMs on systematically manipulated corpora which were human-scale in size, and then evaluated their learning of a rare grammatical phenomenon: the English Article+Adjective+Numeral+Noun (AANN) construction (a beautiful five days''). We compared how well this construction was learned on the default corpus relative to a counterfactual corpus in which AANN sentences were removed. We found that AANNs were still learned better than systematically perturbed variants of the construction. Using additional counterfactual corpora, we suggest that this learning occurs through generalization from related constructions (e.g.,a few days''). An additional experiment showed that this learning is enhanced when there is more variability in the input. Taken together, our results provide an existence proof that LMs can learn rare grammatical phenomena by generalization from less rare phenomena. Data and code: https://github.com/kanishkamisra/aannalysis.

References (60)

Citations (10)

View on Semantic Scholar

Summary

The paper demonstrates that transformer models can learn the rare AANN construction even when direct examples are absent from training data.
It reveals that exposure to related, more common constructions significantly enhances grammatical generalization.
Results underscore that data variability, through diverse linguistic examples, strengthens the models' ability to abstract grammatical rules.

Learning Rare Syntactic Phenomena in LLMs: Insights from the AANN Construction

Insights from Systematic Manipulation of Training Data

Recent developments in the field of computational linguistics have highlighted the capabilities of LLMs to learn and generalize from linguistic input. This blog post discusses a paper that investigates the ability of transformer-based LLMs to learn a specific rare grammatical phenomenon, the English Article+Adjective+Numeral+Noun (AANN) construction, through systematic manipulation of the training data.

The Study at a Glance

The core of the paper involves training LLMs on a corpus that approximates a human-scale linguistic input (100 million words), with and without exposure to instances of the AANN construction. The training was followed by evaluating the models' performance on AANN as well as on purposefully perturbed variants of the construction, to assess the generality of the learning. The findings lend credence to the hypothesis that models can abstract grammatical principles from related, more common constructions, thereby demonstrating an ability to generalize beyond direct experience.

Key Findings

Generalization from Less Rare Phenomena: The paper found that models were able to learn the AANN construction even when explicit instances were removed from the training data, albeit with reduced performance. This suggests that learning leveraged generalization from related constructions encountered in training.
Influence of Related Constructions: Further manipulations of the training data, which removed related constructions (e.g., “a few days”), resulted in a diminished ability to learn AANN, reinforcing the idea that models abstract grammatical rules from them.
Variability Enhances Learning: When models were exposed to a variety of AANN instances in training, showcasing a broad range of adjectives, numerals, and nouns, they were more successful at generalizing the construction compared to models trained on more limited samples. This underscores the role of variability in learning linguistic constructions.
Statistical Learning vs. Memorization: Results indicate that the models' learning of the AANN construction is rooted in statistical learning mechanisms rather than rote memorization. This contrasts with the criticism often leveled at LLMs, suggesting they are merely "stochastic parrots."

Implications and Future Directions

The paper presents compelling evidence that LLMs, even when trained on data of a scale comparable to that encountered by human learners, can generalize and learn rare grammatical constructions. This has significant implications:

Machine Learning: The findings highlight the potential of current statistical learning mechanisms in LLMs to capture complex linguistic phenomena, suggesting avenues for further refining these models’ grammatical generalization capabilities.
Linguistic Theory: The ability to learn from less common phenomena bolsters theories that posit the human linguistic capability stems from generalization over input, rather than innate grammatical knowledge.
Teaching Machines Language: From a practical standpoint, understanding the conditions under which LLMs can generalize rare constructions could inform strategies for training more efficient, linguistically nuanced models.

In future work, extending this approach to a broader set of rare constructions could provide deeper insights into the learning capacities of LLMs and the linguistic principles that underpin language acquisition. Moreover, studies that bridge the gap between grammatical form learning and semantic understanding could offer a more holistic view of language comprehension in machine learning models.

In conclusion, this research contributes to the ongoing exploration of how LLMs learn and generalize, demonstrating that with the right exposure and data manipulation, even rare syntactic phenomena are within the grasp of current language technologies.

PDF Markdown

Tweets

https://twitter.com/kmahowald/status/1857179882574229704

https://twitter.com/kmahowald/status/1775159234222543082

https://twitter.com/kanishkamisra/status/1838286089452425719

https://twitter.com/kanishkamisra/status/1775156648304152648

https://twitter.com/kmahowald/status/1828514792903950428

https://twitter.com/RomanFeiman/status/1799231562271084821

HackerNews

Language Models Learn Rare Phenomena from Less Rare Phenomena (2 points, 0 comments)