- The paper introduces DFI to overcome the limitations of conventional feature attribution methods under correlated predictors.
- It establishes a robust theoretical framework with root-n consistency and asymptotic normality for reliable importance estimation.
- Empirical results show that DFI recovers true importance structures efficiently, even in high-dimensional and strongly correlated scenarios.
Disentangled Feature Importance: A Principled Approach to Feature Attribution under Correlation
The paper "Disentangled Feature Importance" (2507.00260) addresses a central challenge in interpretable machine learning: quantifying feature importance in the presence of correlated predictors. The authors rigorously demonstrate that widely used methods—such as LOCO, conditional permutation importance (CPI), and Shapley values—are fundamentally limited by their inability to properly account for feature dependencies. This limitation leads to systematic underestimation or misattribution of importance, particularly in high-correlation regimes, and is shown to be intrinsic to the population functionals these methods target.
Theoretical Foundations and Critique of Existing Methods
The paper establishes, via elementary probability and variance decomposition, that LOCO and CPI both estimate the same conditional variance functional under squared-error loss:
ψXj=E[Var(μ(X)∣X−j)]
where μ(X) is the regression function. This equivalence, while elegant, exposes a critical flaw: when features are perfectly or near-perfectly correlated, the conditional variance collapses, and genuinely influential features can receive zero importance. The authors provide explicit examples where, despite strong dependence of Y on Xj, both LOCO and CPI assign zero importance to Xj if it is deterministically related to other features. The Shapley value, as a weighted average of LOCO scores across all submodels, inherits this pathology in high dimensions.
Disentangled Feature Importance (DFI): Methodology
To overcome these limitations, the authors introduce Disentangled Feature Importance (DFI), a nonparametric generalization of the classical R2 decomposition. The core idea is to transform the original feature space X into a latent space Z=T(X) where the coordinates are independent, using an optimal transport map T. Feature importance is then computed in this disentangled space, free from the confounding effects of correlation, and attributed back to the original features via the sensitivity of the inverse map.
The DFI procedure consists of three stages:
- Disentanglement: Map X to Z such that Z has independent coordinates (e.g., via the Bures-Wasserstein or Knothe-Rosenblatt transport).
- Latent Importance Computation: For each Zj, compute
ϕZj=E[Var(η(Z)∣Z−j)]
where η(Z)=μ(T−1(Z)).
- Attribution to Original Features: For each Xl, aggregate the latent importances weighted by the squared sensitivity of Xl to Zj:
ϕXl=j=1∑dE[Var(η(Z)∣Z−j)(∂Zj∂Xl)2]
This approach ensures that importance scores are both parsimonious (zero for truly irrelevant features) and exhaustive (nonzero for any feature that influences the outcome, directly or via dependencies).
Statistical Theory and Inference
The authors develop a comprehensive semiparametric theory for DFI. For general transport maps, they establish root-n consistency and asymptotic normality of the importance estimators in the latent space, and extend these results to the original feature space for the Bures-Wasserstein map. The estimators achieve a second-order error rate, vanishing if both the regression function and transport map are estimated at rates faster than n−1/4. The influence function for the attributed importance is derived explicitly, enabling valid confidence intervals and hypothesis testing.
A key computational advantage is that DFI avoids repeated submodel refitting (as in LOCO) and the need to estimate conditional covariate distributions (as in CPI), making it scalable to high-dimensional settings.
Empirical Evaluation
Simulation studies demonstrate that DFI consistently recovers the true importance structure, even under strong feature correlations and nonlinearities. In contrast, LOCO, CPI, and Shapley-based methods systematically underestimate importance in these regimes. DFI also provides accurate inferential coverage and is computationally efficient compared to decorrelated LOCO and Shapley sampling.
A real-data application to HIV-1 resistance prediction illustrates DFI's practical utility. The method identifies biologically meaningful groups of genomic features as most important, with statistical significance, and provides a nuanced ranking that aligns with domain knowledge.
Implications and Future Directions
Practical Implications:
- DFI provides a principled, computationally efficient, and statistically robust framework for feature importance in the presence of arbitrary dependencies.
- The method is directly applicable to high-dimensional, structured, and correlated data, such as genomics, NLP, and causal inference settings.
- DFI's additive structure enables group-level importance analysis and is compatible with modern machine learning pipelines.
Theoretical Implications:
- DFI generalizes the R2 decomposition to nonparametric and nonlinear settings, preserving interpretability and variance attribution.
- The framework connects feature importance to functional ANOVA and Sobol indices, providing a unified view of sensitivity analysis and model interpretability.
Limitations and Future Work:
- The current theory assumes absolutely continuous covariate distributions; extensions to discrete or mixed-type data are needed.
- Inference near the null remains challenging due to the quadratic nature of the functional; conservative interval expansion is suggested as a remedy.
- Efficient estimation of high-dimensional transport maps is an open computational problem; leveraging generative models or scalable optimal transport solvers is a promising direction.
- Extensions to non-squared-error losses, local (instance-level) importance, and incorporation of domain knowledge (e.g., feature hierarchies) are natural next steps.
Conclusion
Disentangled Feature Importance offers a rigorous and practical solution to the longstanding problem of feature attribution under correlation. By leveraging optimal transport and semiparametric theory, DFI enables reliable, interpretable, and computationally tractable importance estimation in complex, real-world data. The framework sets a new standard for feature importance analysis and opens avenues for further methodological and applied advances in interpretable machine learning.