Extend sparse attention decomposition to model diffing
Develop sparse decomposition techniques for transformer attention mechanisms that enable interpretable comparison of attention computations and parameters between two language models, thereby characterizing the differences learned during fine-tuning. Specifically, extend existing low-rank sparse attention decomposition methods to the comparative setting required for model diffing so that attention changes between a base model and its fine-tuned variant can be analyzed analogously to differences captured by transcoder adapters in MLPs.
References
While recent work has begun decomposing attention using sparse methods [He 2025], extending this to study differences between models remains an open question.
— Transcoder Adapters for Reasoning-Model Diffing
(2602.20904 - Hu et al., 24 Feb 2026) in Conclusion, Limitations and Future Work