Develop efficient and accurate causal attribution methods
Develop attribution methods for deep neural networks that efficiently and accurately measure the causal importance of inputs or upstream components on downstream activations and predictions, overcoming limitations of first‑order gradient approximations and distribution‑shifting perturbations.
References
Developing efficient and accurate attribution methods thus remains an open problem.
— Open Problems in Mechanistic Interpretability
(2501.16496 - Sharkey et al., 27 Jan 2025) in Reverse engineering step 2: Describing the functional role of components — Attribution methods (Section 2.1.2, parasection “Attribution methods are necessary for causal explanations but are often difficult to interpret”)