An Analysis of "LLMs as Models of Language" by Raphaël Millière
The paper "LLMs as Models of Language" by Raphaël Millière provides an in-depth examination of the implications that modern neural LLMs—such as those based on the Transformer architecture—may have for theoretical linguistics. The work aims to navigate the intersection of empirical results from computational linguistics and long-standing questions in linguistic theory, particularly concerning syntactic knowledge and language acquisition.
Overview
Millière's central thesis revolves around the idea that, despite being primarily developed for engineering purposes, LLMs (LMs) naturally acquire sophisticated linguistic knowledge from exposure to large corpora. This acquisition provides a platform for reevaluating their relevance to linguistic theory. The paper meticulously covers a historical perspective on statistical LLMing, explores the empirical evidence for syntactic knowledge in LMs, and speculates on their implications for theories of language acquisition and competence.
Historical Background
The paper begins by tracing the evolution of statistical LLMing from its roots in information theory to the contemporary dominance of deep learning models. Millière highlights pivotal moments, such as Claude Shannon's probabilistic models and Noam Chomsky's critique of stochastic approaches, which historically positioned connectionist models as mere statistical approximations devoid of deeper linguistic competence. However, with the advent of deep neural networks, particularly those leveraging the Transformer architecture, this historical skepticism warrants reconsideration.
Empirical Evidence for Syntactic Knowledge
One of the core sections of the paper explores the empirical investigations assessing the syntactic capabilities of LLMs. Three primary methodologies are discussed:
- Behavioural Studies: These studies evaluate model outputs in controlled tasks targeting specific syntactic phenomena, like subject-verb agreement and reflexive anaphora. Findings indicate that Transformer-based models often achieve near-human performance on a broad range of syntactic tasks, suggesting that they acquire not just surface patterns, but deeper syntactic rules.
- Probing Studies: These experiments use diagnostic classifiers to decode linguistic features from the internal representations of LMs. For instance, various studies show that models like BERT encode information about syntactic dependencies, hierarchical constituency, and part-of-speech tags in their learned representations.
- Interventional Studies: These studies perform causal manipulations on model activations to test the causal efficacy of encoded information. They provide stronger evidence that the encoded syntactic information is actively used by the models in making predictions.
Implications for Linguistics
The empirical capabilities of LLMs to encode and use syntactic structures fuel a broader discussion on their relevance to classical linguistic theories. The paper addresses this in the context of three distinct yet interconnected modeling targets: linguistic performance, competence, and acquisition.
- Linguistic Performance: Given the phenomenal success of LMs in mimicking human language use, it is argued that they serve as reasonably good models of linguistic performance. This aligns with their design objective—to predict the next word in a sequence based on prior context.
- Linguistic Competence: Millière critically evaluates whether LMs can be considered models of linguistic competence. While traditional linguistic theories posit a significant gap between observable performance and underlying competence, the syntactic abilities exhibited by LMs challenge this dichotomy. The extent to which models trained on performance data can reveal human-like competence remains a subject of extensive debate.
- Language Acquisition: The most contentious domain is language acquisition. The paper reviews recent studies that train LLMs on developmentally plausible datasets, finding that models can successfully acquire a range of linguistic phenomena even with relatively small amounts of input data. This challenges strong nativist claims like the Poverty of the Stimulus argument, suggesting that statistical models can indeed learn complex language structures without explicit built-in grammatical knowledge.
Future Directions and Theoretical Integration
Millière advocates for a paradigm where computational linguistics and theoretical linguistics jointly contribute to a richer understanding of human language. He suggests that LLMs can be used as controlled experiments to test specific linguistic hypotheses under varied learning conditions. The paper calls for integrating insights from deep neural network research into broader linguistic theory and exploring methodologies to interpret these complex models more formally.
Furthermore, the paper acknowledges the need for procedural adjustments in model evaluation, emphasizing rigorous conditions and controls analogous to those in cognitive psychology and neuroscience. Mechanistic interpretability techniques, such as causal probing and the identification of computational circuits within models, are posited as essential tools for advancing this interdisciplinary dialogue.
Conclusion
"LLMs as Models of Language" presents a compelling argument for the utility of neural LLMs in informing linguistic theory. By systematically presenting empirical evidence and addressing theoretical implications, Millière paves the way for a reconciliatory approach that bridges the predictive prowess of machine learning with the explanatory depth of theoretical linguistics. While challenges remain—such as the inherent opacity of neural networks and the scalability of human-like learning—this paper sets a promising agenda for future research at the confluence of AI and linguistics.