Characterize causal integration of distributed latent features in neural networks
Determine how trained artificial neural networks causally integrate distributed latent features across channels and neurons to generate outputs by characterizing the coordinated contributions of groups of hidden units that combine to produce model predictions.
References
Accordingly, a key open problem is to characterize how networks causally integrate distributed latent features across channels and neurons to generate outputs, analogous to how biological networks produce functional effects through circuit interactions.
— Causal Interpretation of Neural Network Computations with Contribution Decomposition
(2603.06557 - Melander et al., 6 Mar 2026) in Subsubsection “Existing tools for interpreting ANNs,” Section 1 (A framework for understanding biological and artificial neural networks)