Papers
Topics
Authors
Recent
2000 character limit reached

Idiom Heads in Transformers

Updated 27 November 2025
  • Idiom Heads are specialized attention heads in transformer models that recover an idiom's figurative meaning by acting as functional bottlenecks for non-compositional semantics.
  • They are identified using quantitative ablation and causal tracing protocols that measure performance drops and highlight their recurrence across various idiomatic expressions.
  • This mechanism enhances idiom disambiguation by routing figurative interpretations separately from literal meanings, providing scalable insights for language processing.

An Idiom Head is a specialized attention head within transformer-based LLMs that is causally necessary for recovering an idiom’s figurative meaning in an intermediate layer and recurs across the mechanistic circuits for multiple idiomatic expressions (Gomes, 20 Nov 2025). Idiom processing in transformers is characterized by non-compositional semantics, with figurative and literal meanings running in parallel computational paths. Idiom Heads serve as functional bottlenecks that inject, amplify, and route figurative interpretation while allowing literal meanings to bypass this specialized processing. These heads are discovered through quantitative ablation and causal tracing protocols, and they underpin the transformer’s ability to competently represent and disambiguate idioms.

1. Formal Definition and Identification

Idiom Heads are rigorously defined by their causal role in the transformer’s computational graph. For an idiomatic string II at layer LL, the idiom circuit HIH_I is the subgraph of nodes (attention heads and post-MLP residual streams) responsible for encoding the figurative meaning. The impact of a head hh is measured by the performance drop:

dh=cosθM,HcosθM,H{h}d_h = \cos \theta_{M,H} - \cos \theta_{M,H \setminus \{h\}}

where cosθM,H\cos \theta_{M,H} is the cosine similarity between the meaning embedding xmx_m and the circuit-patched hidden state xHx_H (Gomes, 20 Nov 2025). A head is classified as an Idiom Head if it recurs (3\geq3 times) in the HIH_I circuit across multiple idioms and exhibits dh>τidiom|d_h| > \tau_{\text{idiom}}, with τidiom\tau_{\text{idiom}} empirically set in the range $0.004$ to $0.008$.

2. Circuit Discovery via Path Patching

Idiom Head identification relies on modified path patching methodologies:

  • Graph construction: Nodes represent per-token attention heads and post-MLP residuals; edges represent Q, K, V connections, as well as intra-layer residual-to-attention associations.
  • Single-corruption tracing: The model iteratively patches activations, ablates edges, and quantifies ded_e to isolate causal components.
  • Threshold sweep and merging: Circuits are aggregated across corrupted variants, and heads are pruned if lacking substantive upstream causal connections (Gomes, 20 Nov 2025).

This protocol delineates idiom-specific computational circuits, revealing that certain heads act as cross-token routers for figurative cues.

3. Quantitative Profiling of Idiom Heads

Empirical results on canonical idioms show strong performance drops for a highly constrained set of heads:

Idiom Layer 1 Heads Layer 2 Heads Max dd10210^{-2})
kicked the bucket 5*, 14 20* 0.20
piece of cake 15 2, 2 0.15
hit the sack 2* 15 0.15
pulling your leg 3, 3 11 0.11

Heads (2,0), (1,2), (1,3), and (1,5) repeatedly serve as Idiom Heads across diverse idioms, displaying large dhd_h values and near absence of antagonistic (negative drop) heads (Gomes, 20 Nov 2025). Early MLPs and MHSA sublayers (Layers 0–3) also play critical roles in boosting figurative signals and suppressing literal interpretations (Oh et al., 2 Jun 2025).

4. Mechanistic Role: Figurative versus Literal Routing

Transformer architectures encode idiom semantics via dual-path routing:

  • Intermediate (figurative) path: Early MHSA heads bind idiom tokens; subsequent MLP transformations create high-dimensional figurative slots. From layer 4\ell \approx 4 onward, positions like the "because" token shift toward alignment with figurative paraphrase embeddings (Oh et al., 2 Jun 2025).
  • Bypass (literal) route: Literal compositional interpretations are preserved via direct attention edges from the idiom span to the final token, effectively bypassing figurative gating. Selective edge masking can block either route, isolating the flow of meaning (e.g., blocking idiom → because at =5\ell=5 kills figurative boost, while blocking idiom → last at 11,13\ell\approx 11,13 eliminates literal bypass) (Oh et al., 2 Jun 2025).

5. Augmented Reception and Cross-Token Dynamics

"Augmented reception" describes the phenomenon where early token representations (even function words) causally influence the Query or Key vectors of an idiom’s final token via upstream attention edges. For a head hh at layer \ell targeting token jj, the attention weight is

αji(,h)exp(Qj(,h)Ki(,h))\alpha_{j \to i}^{(\ell,h)} \propto \exp(Q_j^{(\ell,h)} \cdot K_i^{(\ell,h)})

Incoming Q-edges from kk to jj mean QjQ_j depends on xk(1)x_k^{(\ell-1)}. This mechanism sharpens cross-token coupling, enabling the final idiom token to "receive" context-influenced attention, which is quantifiably necessary for figurative resolution (Gomes, 20 Nov 2025).

6. Computational Efficiency, Redundancy, and Implications

Transformers deploy Idiom Heads with functional specialization analogous to Induction or Mover heads, but for non-compositional semantics. Redundancy—multiple heads handling similar cues—balances computational efficiency and generalization robustness. Idiom-specific directions in Q–K space allow shared heads to serve multiple idioms without semantic collapse (Gomes, 20 Nov 2025).

This specialization suggests a scalable mechanism for tackling figurative, metaphorical, or sarcastic language via mechanistic interpretability. Monitoring Idiom Head activation or intermediate-path alignment (e.g., at the "because" token) has diagnostic potential for real-time disambiguation or model auditing (Oh et al., 2 Jun 2025).

7. Limitations and Pathways for Extension

Current findings are established for 1B-parameter autoregressive transformers and short-context idioms. A plausible implication is that integrating larger models, richer discourse contexts, and continuous-time state probes (CCA, RSA) could resolve even more granular mechanisms involved in non-compositional language processing (Oh et al., 2 Jun 2025). Further research is expected to adapt these mechanistic interpretability tools for more complex grammatical phenomena, idiom types, and linguistic ambiguity beyond idioms.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Idiom Heads.