Evaluate cross-architecture applicability and develop generalizable methods
Determine how well existing interpretability methods (including sparse dictionary learning and circuit analysis) apply to architectures such as diffusion models, vision transformers, RWKV, and state space models, and develop techniques that generalize effectively across architectures.
References
Assessing how well interpretability methods apply to architectures beyond those for which they were developed, and whether we can develop techniques that generalize effectively across architectures remain open questions.
— Open Problems in Mechanistic Interpretability
(2501.16496 - Sharkey et al., 27 Jan 2025) in Mechanistic interpretability on a broader range of models and model families (Section 3.6)