Incremental Sentence Processing Mechanisms in Autoregressive Transformer Language Models (2412.05353v1)

Published 6 Dec 2024 in cs.CL

Abstract: Autoregressive transformer LLMs (LMs) possess strong syntactic abilities, often successfully handling phenomena from agreement to NPI licensing. However, the features they use to incrementally process language inputs are not well understood. In this paper, we fill this gap by studying the mechanisms underlying garden path sentence processing in LMs. We ask: (1) Do LMs use syntactic features or shallow heuristics to perform incremental sentence processing? (2) Do LMs represent only one potential interpretation, or multiple? and (3) Do LMs reanalyze or repair their initial incorrect representations? To address these questions, we use sparse autoencoders to identify interpretable features that determine which continuation - and thus which reading - of a garden path sentence the LM prefers. We find that while many important features relate to syntactic structure, some reflect syntactically irrelevant heuristics. Moreover, while most active features correspond to one reading of the sentence, some features correspond to the other, suggesting that LMs assign weight to both possibilities simultaneously. Finally, LMs do not re-use features from garden path sentence processing to answer follow-up questions.

Summary

The paper reveals that transformer models combine deep syntactic structures and surface-level heuristic features when processing ambiguous sentences.
The paper demonstrates that models activate multiple simultaneous interpretations for garden path sentences instead of committing to one parse.
The paper shows that transformers discard initial syntactic features rather than fully reanalyzing them upon receiving disambiguating input.

Analyzing Incremental Sentence Processing in Autoregressive Transformers

The paper "Incremental Sentence Processing Mechanisms in Autoregressive Transformer LLMs" investigates the mechanisms by which autoregressive transformer LLMs process sentences incrementally, focusing on their handling of temporary syntactic ambiguities known as garden path sentences. Despite the established syntactic capabilities of LLMs (LMs), the specific features and processes these models employ when encountering syntactic ambiguities are not well understood. This paper addresses this gap by analyzing how LMs process garden path sentences, aiming to elucidate whether these models rely on syntactic features or shallow heuristics, whether they represent one or multiple potential interpretations, and how they manage to reanalyze or repair initial representations upon encountering disambiguating information.

Methodological Approach

The authors leverage sparse autoencoders (SAEs) to identify interpretable features within LMs that influence the model's preference for particular readings of ambiguous sentences. They employ a linear approximation technique called Attribution Patching with Integrated Gradients (AtP-IG) to estimate the causal contribution of each feature, aiming to construct feature circuits that underlie the LMs' sentence processing behaviors.

Key Findings

Syntactic vs. Heuristic Features: The paper reveals that while many high-importance features are syntactic in nature, some are purely heuristic. This indicates that LMs use a combination of deep syntactic structures and surface-level heuristics in processing sentences.
Simultaneous Representation and Ambiguity: The investigation shows that LMs consider multiple interpretations of ambiguous sentences. Features encoding both potential readings are activated, suggesting that LMs do not commit exclusively to one parse but maintain multiple possibilities concurrently.
Reanalysis versus Repair: The research indicates that LMs do not fully repair or reanalyze their initial representations when processing sentences containing disambiguating information. Instead, the models seemingly discard earlier syntactic features, indicating a fundamentally different process from human reanalysis mechanisms.

Implications and Future Directions

The insights from this paper have significant implications for understanding LM capabilities and their cognitive analogs. The observation that LMs do not engage in human-like syntactic repair or reanalysis suggests that their incremental processing deviates from natural language processing in humans. This distinction is critical for future LM development aimed at more human-like syntactic understanding and reasoning.

Exploring how LMs distinguish meaningful ambiguity from noise remains an open question and is crucial for advancing human-computer interaction in contexts where subtleties of language matter significantly. This research might also inform training strategies that emphasize syntactic coherence and ambiguity resolution in increasingly sophisticated LMs.

Furthermore, the developments in SAE interpretability could facilitate more accurate attribution of model decisions to specific internal processes, enhancing transparency and enabling model debugging. Given the ongoing improvements in model architectures and interpretability techniques, future work could expand this analysis to a broader variety of syntactic phenomena and larger, more advanced LLMs.

Conclusion

This paper contributes substantially to our understanding of how autoregressive LMs process syntactically ambiguous inputs, revealing a complex interplay of syntactic and heuristic elements within their computations. It advances the conversation on the limitations of current LMs in mimicking human-like sentence comprehension, providing a foundation for research aimed at bridging this cognitive gap.

PDF Markdown

Related Papers

Tweets

https://twitter.com/michaelwhanna/status/1869738106297143505

https://twitter.com/amuuueller/status/1899458005516656805