Papers
Topics
Authors
Recent
Search
2000 character limit reached

Jacobian Scopes: Token-Level Causal Attribution

Updated 30 January 2026
  • Jacobian Scopes are a framework for token-level causal attribution, quantifying input influence using first-order gradients.
  • The method uses semantic, Fisher, and Temperature variants to accurately measure contributions to various output metrics in neural models.
  • It leverages automatic differentiation for efficient computation, balancing precision and cost in large-scale interpretability tasks.

Jacobian Scopes constitute a rigorous framework for token-level causal attribution in neural models, particularly LLMs. The concept originates in modern interpretability research, where the objective is to quantify the degree to which each input token influences specific model outputs. Jacobian Scopes apply first-order differentiation of hidden states with respect to inputs to produce local, model-blind influence assignments. The methodology generalizes to a variety of output quantities—including specific logits, the probability distribution, and model confidence—through gradient projection. This article provides an authoritative, comprehensive synthesis of the state-of-the-art in Jacobian Scopes, spanning formal definitions, algorithmic methodology, empirical applications, and computational trade-offs (Liu et al., 23 Jan 2026).

1. Formal Definition and Preliminaries

Jacobian Scopes measure the sensitivity of a model's output, via its post-norm final hidden state yy to infinitesimal perturbations in each input token embedding xt∈Rdmodelx_t \in \mathbb{R}^{d_\mathrm{model}}. For an autoregressive LLM with output logits z=Wyz = W y, vocabulary VV, and final-layer hidden representation y=hL(x1:T)y = h_L(x_{1:T}), the token-wise Jacobian is

Jt:=∂y∂xt∈Rdmodel×dmodel,J_t := \frac{\partial y}{\partial x_t} \in \mathbb{R}^{d_\mathrm{model} \times d_\mathrm{model}},

where tt indexes input tokens.

A projection vector v∈Rdmodelv \in \mathbb{R}^{d_\mathrm{model}} encodes the output quantity of interest (e.g., a specific logit, model confidence, or full distribution geometry). The influence score for token tt is then

Influencet:=∥v⊤Jt∥2,\mathrm{Influence}_t := \| v^\top J_t \|_2,

interpreted as the maximal first-order change in xt∈Rdmodelx_t \in \mathbb{R}^{d_\mathrm{model}}0 induced by a unit-norm perturbation to xt∈Rdmodelx_t \in \mathbb{R}^{d_\mathrm{model}}1.

This formulation is compelling because it provides exact, local attributions independently of internal model decompositions such as attention heads or circuit pathways.

2. Variants of Jacobian Scopes: Semantic, Fisher, and Temperature

Jacobian Scopes are instantiated via three principal variants, each differing in the choice of projection direction xt∈Rdmodelx_t \in \mathbb{R}^{d_\mathrm{model}}2 and the functional attributed.

Semantic Scope: Attributes influence to a specific target token xt∈Rdmodelx_t \in \mathbb{R}^{d_\mathrm{model}}3 by selecting xt∈Rdmodelx_t \in \mathbb{R}^{d_\mathrm{model}}4, the corresponding row of xt∈Rdmodelx_t \in \mathbb{R}^{d_\mathrm{model}}5. Thus, xt∈Rdmodelx_t \in \mathbb{R}^{d_\mathrm{model}}6, and xt∈Rdmodelx_t \in \mathbb{R}^{d_\mathrm{model}}7. This variant isolates causal contributions to a single logit, requiring only one backward pass.

Fisher Scope: Attributes influence to the entire predictive distribution. The Fisher information geometry xt∈Rdmodelx_t \in \mathbb{R}^{d_\mathrm{model}}8 is pulled back through xt∈Rdmodelx_t \in \mathbb{R}^{d_\mathrm{model}}9 as z=Wyz = W y0, and influence is measured by z=Wyz = W y1. Accurately computing z=Wyz = W y2 requires z=Wyz = W y3 backward passes; efficient approximations are possible for scalability.

Temperature Scope: Attributes influence to model confidence, defined via inverse effective temperature z=Wyz = W y4 and normalized direction z=Wyz = W y5. Then z=Wyz = W y6 and z=Wyz = W y7. This variant approximates distribution-wide attribution at the computational cost of a single backward pass.

All variants are agnostic to architectural details, relying on global differentiability from outputs to inputs.

3. Algorithmic Procedures and Computational Complexity

Jacobian Scopes are implemented via automatic differentiation, typically in frameworks supporting full gradient backpropagation. The procedural template is as follows:

  • For Semantic and Temperature Scopes, compute scalar loss (e.g., z=Wyz = W y8, z=Wyz = W y9), then backpropagate to VV0 for all VV1 in a single pass.
  • For Fisher Scope, backpropagate for each orthogonal output direction to assemble the full VV2 tensor, then contract with VV3 to obtain VV4 per token. This scales linearly with VV5.

The optimization of this matrix chain product computation, particularly in programs consisting of many sequential differentiable subprograms, draws on the literature of scheduled Jacobian chaining. The dynamic programming heuristic elegantly balances serial and parallel computation to nearly match optimal schedules, with median cost ratios VV6 for realistic chain lengths and machine counts (Märtens et al., 9 May 2025).

4. Applications in LLMs: Attribution, Bias Detection, and Translation

Jacobian Scopes have been empirically validated in multiple contexts:

  • Instruction Understanding: Attribution scores for model predictions such as "truthful" or "deceitful" in prompts sharply peak on pivotal tokens ("deceive," "argue").
  • Political Bias: Scopes identify context tokens ("Columbia," "the South") as dominant drivers of logit predictions for "liberal" and "conservative," exposing training-set induced model biases.
  • Machine Translation: Fisher Scopes yield token-to-token alignments, while Temperature Scopes reveal phrase-level regions of influence, consistent across semantic divergences.
  • In-Context Learning (ICL): Applied to time-series forecasting, Scopes reveal the tendency of LLMs to attend to near-cutoff motifs or shift focus downstream for stochastic processes (Brownian motion).

These findings underscore the analytic power of token-level gradient-based attribution and its role in mechanistic interpretability.

5. Methodological Comparison: Advantages and Limitations

Jacobian Scopes contrast with prior interpretability techniques as follows:

  • Attention weights: Not reliably causal; reflect distribution of computation, not influence.
  • Activation patching: Mechanistic but interventional and computationally expensive.
  • Integrated Gradients: Sensitive to out-of-distribution effects; may suffer "attention sink" artifacts.

Jacobian Scopes operate efficiently (O(1) for Semantic/Temperature, O(VV7) for Fisher), admit arbitrary projection directions, and are fully model-agnostic. However, they represent only local, first-order linearizations; nonlinear or nonlocal dependencies are not captured. They do not resolve layerwise/headwise mechanisms and are more computationally intensive than forward-only methods for large-scale analyses.

6. Extensions, Open Directions, and Impact

Potential extensions include:

  • Spectral projection directions (e.g., top-k logit explanations).
  • Scalable low-rank Fisher Scope approximations.
  • Compositional attribution by integrating Jacobian Scopes with circuit-tracing or symbolic causal analysis.

These tools have substantial impact, enabling principled bias auditing, debugging of prompt influence, and detailed mechanism analysis in LLMs. A plausible implication is increased adoption for safety-critical model evaluation, interpretability benchmarking, and the development of robust user-facing systems.

The reference implementation and empirical benchmarks are available (Märtens et al., 9 May 2025, Liu et al., 23 Jan 2026), supporting reproducibility and further exploration.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Jacobian Scopes.