Papers
Topics
Authors
Recent
Search
2000 character limit reached

Gradient Atoms: Unsupervised Discovery, Attribution and Steering of Model Behaviors via Sparse Decomposition of Training Gradients

Published 15 Mar 2026 in cs.AI | (2603.14665v1)

Abstract: Training data attribution (TDA) methods ask which training documents are responsible for a model behavior. We argue that this per-document framing is fundamentally mismatched to how fine-tuning actually works: models often learn broad concepts shared across many examples. Existing TDA methods are supervised -- they require a query behavior, then score every training document against it -- making them both expensive and unable to surface behaviors the user did not think to ask about. We present Gradient Atoms, an unsupervised method that decomposes per-document training gradients into sparse components ("atoms") via dictionary learning in a preconditioned eigenspace. Among the 500 discovered atoms, the highest-coherence ones recover interpretable task-type behaviors -- refusal, arithmetic, yes/no classification, trivia QA -- without any behavioral labels. These atoms double as effective steering vectors: applying them as weight-space perturbations produces large, controllable shifts in model behavior (e.g., bulleted-list generation 33% to 94%; systematic refusal 50% to 0%). The method requires no query--document scoring stage, and scales independently of the number of query behaviors of interest. Code is here: https://github.com/jrosseruk/gradient_atoms

Authors (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 1 like about this paper.