Step-Decomposed Influence (SDI)
- Step-Decomposed Influence (SDI) is a framework that quantifies the temporal and stagewise impact of training data on model predictions.
- It employs efficient sketching methods and Bayesian techniques to capture dynamic influence trajectories across training checkpoints.
- SDI reveals phase transitions and fine-grained attribution patterns, aiding in interpretability, adaptive compute scaling, and improved model diagnostics.
Step-Decomposed Influence (SDI) is a principled framework for attributing and analyzing the temporal evolution of training data influence within learned models. SDI rigorously quantifies not only which data points affect a given prediction, but precisely when and at which computational or developmental stage this influence occurs. Recent advances, especially in the context of looped transformers and stagewise neural learning, have formalized SDI both as a length- trajectory over recurrent network steps and as a dynamic function of training time, revealing fine-grained temporal patterns in model reasoning and feature formation (Kaissis et al., 10 Feb 2026, Lee et al., 14 Oct 2025).
1. Formal Definitions and Mathematical Foundations
Two principal formalisms of Step-Decomposed Influence have been established.
A. Stepwise Influence in Looped Transformers:
Let be the loss on example under parameters . For a looped transformer, with recurrent body parameters applied over steps, the total gradient on a test example decomposes as
where aggregates gradient contributions at loop step . The SDI trajectory between train example and test example over training checkpoints with learning-rate weights is
where
B. Bayesian and Stagewise Influence:
Let denote model parameters and be the local Bayesian posterior at training step . The SDI of upweighting training sample on the loss of at is
This tracks influence as a function of training time, characterizing non-monotonic transitions and revealing developmental phases (Lee et al., 14 Oct 2025).
2. Algorithmic Frameworks and Sketching Methods
A. SDI in Looped Transformers:
To accommodate the prohibitive memory cost of per-example gradients at scale, SDI utilizes sketching—especially TensorSketch and CountSketch—for on-the-fly dimensionality reduction of the . During training,
- Forward and backward passes are instrumented to recover activations and backpropagated signals at each loop step and token.
- For matrix parameters, sketched outer products are accumulated.
- For vector parameters, is used.
- Per-example, per-step sketches are stored, from which dot products approximate .
Error in each SDI estimate is unbiased and decays as , where is the sketch dimension (Kaissis et al., 10 Feb 2026).
B. Bayesian SDI with SGLD Sampling:
In the stagewise context, Stochastic Gradient Langevin Dynamics (SGLD) is used to sample from the local posterior at each checkpoint. For each ,
- Multiple SGLD chains run from the saved checkpoint to yield draws .
- Empirical covariance of per-example loss across draws yields .
This pipeline is computationally intensive ( per step for samples, examples), but robust to singular geometries and directly applicable to large neural models (Lee et al., 14 Oct 2025).
3. Interpretability, Temporal Attribution, and Cancellation
SDI reveals not merely the magnitude or sign of training data influence but its stepwise or stagewise localization:
- Influence Horizon: SDI identifies the final loop iteration or training phase during which training data exerts significant effect, crucial for optimizing compute allocation and dynamic halting.
- Cancellation Detection: Aggregate influence may obscure phase-opposing contributions; SDI can separate early positive from late negative (or vice versa) effects within recurrent computations.
- Mechanistic Hypotheses: In algorithmic reasoning or parity tasks, SDI exposes periodic and state-machine-like latent dynamics, as shown by step-peaked influence aligned with discrete hidden state cycles (Kaissis et al., 10 Feb 2026).
- Phase Transition Mapping: Stagewise SDI recovers sharp peaks and sign flips in influence at theoretically predicted “phase transitions” in model development, e.g., emergence of induction heads, morphological structure discovery, or syntactic scoping rules (Lee et al., 14 Oct 2025).
4. Empirical Results and Scaling Properties
Experiments conducted on looped GPT-style transformers and linguistic/cognitive tasks empirically validate the SDI methodology (Kaissis et al., 10 Feb 2026):
- Accuracy: In looped transformers (135M params, ), TensorSketch SDI (sketch size ) yields 3.9% relative error to dense baseline. Conservation of total influence is maintained to .
- Scalability: SDI allows 10 larger batch sizes within the same GPU budget by sketching, with runtime overhead dominated by the backward pass rather than sketching itself.
- Latency: Per-checkpoint overhead for SDI computation is s relative to inference alone.
- Mechanistic Probes: In a parity cycle experiment, SDI displays a four-step sawtooth pattern precisely tracking an interpretable four-state dynamical system.
- Reasoning Tasks: In Sudoku via looped transformers, more difficult puzzles yield higher self-influence and extended late-step influence persistence (slower energy curve decay), directly mirroring accuracy improvements with additional compute.
Stagewise SDI on LLMs (Pythia, 70M–12B params) reveals non-monotonic, class-specific influence traces corresponding to known model developmental stages and circuit formation (Lee et al., 14 Oct 2025).
5. Complexity, Error Guarantees, and Practical Considerations
For looped transformers, the sketching implementation imposes memory (vs for dense), with additional computation per batch for FFT-based TensorSketch—negligible relative to overall training. The error in outer-product and dot-product sketches is analytically characterized, with concentration bounds and decay in variance as .
In the Bayesian (SGLD) setting, each posterior sampling chain introduces per-step cost , with overall scaling driven by the number of draws and minibatch size. SDI is Hessian-free and thus applicable at arbitrary traversed checkpoints, not just local minima (Lee et al., 14 Oct 2025).
Notable limitations include the cost and hyperparameter tuning of SGLD at large scale, the observational (not causal) nature of SDI, and interpretive ambiguity of influence sign outside clear developmental transitions.
6. Applications and Extensions in Model Attribution
SDI enables a diverse set of data attribution and interpretability tasks:
- Depth-Targeted Data Curation: Training examples can be down-weighted or selected according to their predominant influence phase (e.g., early parsing vs late semantic refinement).
- Safe Compute Scaling: SDI “energy” (sum of stepwise influence magnitudes) informs adaptive loop truncation or halting in recurrent models, grounded in empirical attribution signal.
- Schema Discovery and Circuit Attribution: Group-averaged SDI supports discovery of structural and functional classes within models, aiding mechanistic interpretability.
- Retrieval-then-Refine and RLHF Extensions: Sketched, step-aligned influence enables efficient retrieval-based attribution and fine-grained localization of preference/feedback impact in reinforcement learning from human feedback.
A plausible implication is that SDI exposes not only the static salience of training data, but dynamically organizes and temporally localizes the emergence of model capabilities. This enables a new class of temporal and mechanistic interpretability studies beyond static attribution.
Key References:
- "Step-resolved data attribution for looped transformers" (Kaissis et al., 10 Feb 2026)
- "Influence Dynamics and Stagewise Data Attribution" (Lee et al., 14 Oct 2025)