Do LLMs build syntactic structure via attention weights?

Determine whether large language models such as GPT-4 and LLaMA-3 construct syntactic structure through attention-weight-based mechanisms, as suggested by their variable acceptability judgments on Norwegian parasitic gap items with semantically anomalous fillers, and characterize how attention patterns would account for the observed graded judgments in these cases.

Background

Within the evaluation of parasitic gap constructions, the paper reports strong alignment with expected judgments in English but more variable performance in Norwegian, particularly for items with semantically unusual fillers reminiscent of "colorless green ideas" sentences. This variability leads the authors to hypothesize that differences might reflect how models build structure.

The authors explicitly flag as an open question whether the observed behavior points to structure-building grounded in attention weights. Clarifying if attention mechanisms implement the models' structural representations would illuminate why certain semantically complex Norwegian items yield departures from otherwise robust performance on parasitic gaps.

References

This may point to the way LLMs build structure based on attention weights, but we leave this as an open question.

Grammaticality Judgments in Humans and Language Models: Revisiting Generative Grammar with LLMs (2512.10453 - Johnsen, 11 Dec 2025) in Subsection "Parasitic Gaps," paragraph "Parasitic gap sensitivity," in Section "Results"