Recursive K-Hop Attribution
- The paper shows that recursive K-hop attribution improves multi-hop causal attribution by accurately aggregating intermediate token contributions.
- It leverages iterative span and Q–V neuron weighting to efficiently trace reasoning chains from output back to input tokens.
- Empirical results demonstrate significant gains in faithfulness, efficiency, and knowledge editing performance using one-hop recursion over traditional single-hop methods.
Recursive K-Hop Attribution is a formal mechanism for tracing and quantifying the flow of causal importance through multi-step reasoning chains in transformer LLMs, particularly for tasks involving interpretability and knowledge editing. By recursively attributing model outputs through intermediate reasoning tokens (or neurons) back to their true input sources, it provides a faithful and efficient explanation of how information propagates across several inferential “hops.” This paradigm addresses fundamental limitations of single-hop attribution, which frequently fails to recover multi-hop causal dependencies due to the “information absorption” problem at reasoning intermediates.
1. Formalism and Core Definitions
Recursive K-Hop Attribution is built upon the decomposition of transformer model processing into input, reasoning, and output spans:
- , with (input tokens), (intermediate/“thinking” tokens), and (output tokens).
- In multi-hop reasoning, a chain is denoted as
where are (possibly implicit) subjects and the target answer (Yang et al., 9 Oct 2025).
K-hop recursive attribution constructs a sequence of attribution distributions over :
- : attributes the output span back to all tokens in .
- (): , focusing recursive attribution on weighted by the prior hop’s importances.
The final input-only attribution is computed by aggregating across hops using hop-specific flow ratios :
where measures attribution mass remaining on intermediates after each hop (Pan et al., 2 Feb 2026).
In the neuron-level setting of multi-hop factual recall and editing, neurons are categorized for each hop into:
- : Query neuron set (subkey activations responsive to in layer )
- : Value neuron set (subvalues encoding for downstream hops) The 1-hop importance is measured as logit-difference impact:
and K-hop attributions are composed as . This models chained Q–V activations throughout the stack (Yang et al., 9 Oct 2025).
2. Recursive Algorithms and Implementation
The recursive K-hop mechanism in FlashTrace (Pan et al., 2 Feb 2026) and ACE (Yang et al., 9 Oct 2025) follows a streamlined iterative process:
- Initialization (Hop 0): Attribute the model’s output back to the entire context.
- Recursive Attribution (Hops 1 to K):
- For tokens or neurons in reasoning spans, propagate and reweight their attribution from the preceding hop.
- Aggregate source contributions via span-wise weighting or matrix multiplication of Q–V importances across layers.
- Final Projection: Attribution mass is recursively funneled to input tokens, discounting surviving importance on intermediates with flow ratios at each step.
FlashTrace implements span-wise aggregation for each attribution hop, with per-hop complexity , where is layer count, is context length, is target span, and is embedding dimension. This sidesteps the scaling of naive token-by-token multi-hop attribution, enabling practical application even in 10K-token sequences (Pan et al., 2 Feb 2026).
In the ACE framework, neuron-level attribution scores are computed for each layer’s Q and V populations, and aggregate hop weights are used to compose recursive flow using matrix multiplication (Yang et al., 9 Oct 2025).
3. Empirical Findings and Metrics
Extensive benchmarking of recursive K-hop attribution reveals that:
- Faithfulness: The primary benefit of introducing K-hop recursion is improved faithfulness (measured by metrics such as MAS). In “MoreHopQA” ablation:
- K=0 (no recursion): MAS = 0.209
- K=1 (single hop): MAS = 0.205 (largest observed improvement)
- K=2,3: Diminishing or negative returns (MAS = 0.206, 0.209) (Pan et al., 2 Feb 2026)
- Efficiency: FlashTrace achieves over 130x speedup relative to baseline multi-token attribution methods.
- Hop Bound: Most causal chains are captured after the first hop; additional hops produce marginal or adverse effects due to noise accumulation (Pan et al., 2 Feb 2026).
In multi-hop factual recall (as tested on MQuAKE-3K), attribution-controlled knowledge editing leveraging K-hop recursion in ACE achieves large empirical gains:
- On GPT-J, 4-edit accuracy: PMET = 17.01%, ACE = 43.29% ()
- On Qwen3-8B, 4-edit accuracy: PMET = 11.20%, ACE = 47.61% () Mean improvements: (GPT-J), (Qwen3-8B) (Yang et al., 9 Oct 2025).
4. Mechanistic Interpretation in Transformer Models
Recursive K-hop attribution provides a mechanistically grounded account of multi-hop reasoning and intervention in transformer LLMs:
- Intermediate tokens (implicit or explicit) are responsible for activating specialized query neurons, which in turn gate value neurons that encode further reasoning content.
- The Q–V pathway composition recursively traces the activation chain from final output to initial input, sideways and across layers, mirroring the architecture’s reasoning logic.
This framework corrects major deficiencies in earlier, single-hop attribution/editing, which typically obfuscate the transmission of multi-step factual dependencies and fail to capture the composite flow of information across hidden states and layers (Yang et al., 9 Oct 2025).
5. Practical Considerations and Limitations
Guidelines for deployment of recursive K-hop attribution in practice include:
- Number of Hops: One recursive hop () suffices for nearly all tasks and achieves the principal gains in attribution faithfulness. Additional hops introduce both extra compute overhead and the risk of noise, with negligible further benefit.
- Span-wise Aggregation: Ensures that each recursive attribution costs roughly a single forward-like pass, practical for very long contexts.
- Flow Ratio Monitoring: If deep reasoning chains are anticipated, monitoring enables dynamic stopping—further recursion is unnecessary when , indicating negligible intermediate mass remains (Pan et al., 2 Feb 2026).
For knowledge editing, it is essential to intervene in both the query and value neuron populations implicated by K-hop attribution, as omitting either dimension leads to substantial degradation in multi-hop factual recall (Yang et al., 9 Oct 2025).
6. Implications for Interpretability and Knowledge Editing
The recursive K-hop attribution paradigm fundamentally advances model interpretability and knowledge localization:
- Enables precise diagnosis of multi-hop reasoning chains, exposing the full causal path from answer to original input.
- Supports robust, mechanistically informed knowledge editing by directly targeting the Q–V pathways implicated in chained inference, instead of only local edits.
- Suggests future research on path-integral attributions or dynamic Q–V tracing to enable even finer-grained and more reliable model intervention, particularly as reasoning depths increase.
The structural insight that intermediate subjects function as query neurons orchestrating distributed recall underscores the inseparability of interpretability and editing in highly compositional models (Yang et al., 9 Oct 2025).
References:
“Towards Long-Horizon Interpretability: Efficient and Faithful Multi-Token Attribution for Reasoning LLMs” (Pan et al., 2 Feb 2026). “ACE: Attribution-Controlled Knowledge Editing for Multi-hop Factual Recall” (Yang et al., 9 Oct 2025).