Disentangled Feature Importance (DFI)

Updated 2 July 2025

Disentangled Feature Importance is a framework that transforms correlated predictors into an independent latent space for unbiased attribution of each feature's contribution.
It overcomes correlation distortion by disentangling unique, redundant, and interactive effects to fully capture the predictive signal.
DFI offers theoretical guarantees and computational efficiency, making it ideal for applications in genomics, NLP, and causal inference.

Disentangled Feature Importance (DFI) is a class of methodologies and a conceptual framework in machine learning and statistical modeling designed to quantify and interpret the influence of individual features on predictive outcomes in settings where predictors may be highly correlated, exhibit complex statistical dependence, or interact nontrivially. Standard feature importance methods—such as permutation, conditional permutation, refitting (LOCO), and Shapley values—often fail to attribute importance accurately under these conditions due to their inherent assumption of feature independence or their inability to separate unique, redundant, and interactive contributions. DFI rigorously addresses these limitations by transforming correlated predictors into independent latent variables, performing importance assessment in this disentangled space, and providing provably valid attributions and decompositions that retain interpretability, summing to the total predictive signal.

1. Motivation: Correlation Distortion and the Limits of Conventional Feature Importance

Quantifying the role of each predictor variable is a cornerstone of interpretable machine learning. In practice, predictors often display strong dependencies. When these dependencies are ignored by the feature importance method, the resulting importance scores systematically underestimate contributions of features whose effects manifest through correlated partners. This phenomenon, termed correlation distortion, is particularly severe in domains such as genomics, where gene expression features are highly collinear, in NLP, where tokens or n-grams co-occur due to syntax or idiom, and in causal analyses, where latent confounders induce structural dependence among observed inputs.

Analytically, recent work proves that most major population functional-based feature importance estimators—specifically, those based on permutation, conditional permutation, LOCO, and Shapley values—target identical objects under squared-error loss, converging to the expected conditional variance of the target given all predictors except one. In the presence of correlation, this quantity may be zero for all but a single predictor in a cluster, even if all are causally important, thus revealing a root cause for widespread underestimation and misranking in model explanation.

2. Core Methodology: Disentanglement via Optimal Transport

Disentangled Feature Importance overcomes these limitations by mapping correlated features into an independent latent variable space. The mapping is realized through an optimal transport map $T: \mathbb{R}^d \to \mathbb{R}^d$ , which pushes the observed joint feature distribution to a reference independent distribution (usually a product of independent marginals, e.g., standard normal).

Formally, given a vector of predictors $X = (X_1, ..., X_d)$ , the transport map $T$ yields latent variables $Z = T(X)$ with independent coordinates. Feature importance is then evaluated in the $Z$ -space, ensuring that effects previously conflated by correlation are rendered statistically disentangled.

For regression with target $Y$ , the importance of latent feature $Z_j$ is defined as

$\phi_{Z_j} = \mathbb{E}\left[ \mathrm{Var}\left( \mathbb{E}[Y|Z] \mid Z_{-j}\right) \right]$

which characterizes the unique contribution of $Z_j$ to the explainable variation in $Y$ . Importance scores for original features are then attributed back via a chain rule involving the Jacobian sensitivity $\left(\partial X_l/\partial Z_j\right)^2$ , thus reflecting how action in latent directions translates through the original feature manifold. By this mechanism, DFI naturally yields decompositions that, for latent additive models, sum to the total predictive variability; for functions with interaction structure, DFI recovers interaction-weighted functional ANOVA variances.

Key choices for the transport map include:

Bures–Wasserstein map: For affine Gaussian settings, yields a closed-form linear transformation (whitening).
Knothe–Rosenblatt rearrangement: For arbitrary distributions, yields a triangular map inducing independence coordinatewise.

3. Theoretical Guarantees: Semiparametric Consistency and Statistical Inference

DFI is supported by rigorous semiparametric theory, establishing:

Root- $n$ consistency and asymptotic normality for importance estimators in the latent space. With appropriate estimation of both the regression function $\eta(z)$ and the transport map $T$ , the estimator for $\phi_{Z_j}$ converges at the parametric rate, with precise asymptotic variance characterized through the efficient influence function.
Second-order estimation error: The error term vanishes if both the regression estimator and the transport map estimator converge at rates $o_{\mathbb{P}}(n^{-1/4})$ , ensuring negligible impact on statistical inference in large samples.
Functional ANOVA and total importance sum: In the independently transformed space, the DFI decomposition reproduces the full variance-explained or $R^2$ (nonparametric generalization of the Genizi decomposition); for interacting models, DFI apportions interaction-weighted variances analogously.

For the Bures-Wasserstein case, these properties carry over directly to the original feature space, enabling the construction of influence-function-based confidence intervals for the imported importance attributions.

4. Computational Considerations and Practical Implementation

Classic importance methods may require computationally prohibitive refitting across a combinatorial number of feature subsets (e.g., LOCO, Shapley procedures), or estimation of high-dimensional conditional distributions (e.g., CPI), which is challenging in non-linear or large- $d$ settings.

DFI achieves computational tractability by:

Modeling only once: The primary regression model (e.g., a random forest or neural net) is fit once, and importance is assessed by resampling in the uncorrelated latent space.
No need for conditional sampling: Since latent variables are independent by construction, resampling any coordinate is model- and distribution-agnostic.
Efficient transport computation: Linear transport (whitening) is fast for Gaussians; non-Gaussian optimal transport is accessible via advances in scalable transport map estimation.
Batch inference: All feature importances in the latent space are computed together from the same model and resampling pipeline, greatly reducing computational expense compared to subset-based algorithms.

Simulation and empirical results indicate substantial speedups—up to orders of magnitude—relative to classic refit-based or Shapley-based schemes, especially in high-dimensional, highly correlated data environments.

5. Comparative and Applied Insights

A central advance of DFI is its correct attribution of importance in the presence of strong correlation or redundant predictors. When predictors are perfectly collinear or a function of each other, DFI assigns each non-trivial importance (reflecting their joint contribution), whereas CPI and LOCO produce importance zero for all but one and misattribute shared effects. DFI remains effective in high-dimensional applications, such as genomics and LLMing, discovering true drivers hidden behind dense correlation networks or co-occurrence patterns.

For models with interaction terms, DFI's decomposition captures both main effects and the interaction-weighted contributions, providing a principled generalization of variance partitioning to the nonparametric, dependent case.

When contrasted to Shapley-based, marginal, or conditional permutation approaches, DFI stands out by:

Eliminating correlation-induced bias: Faithful importance even for features whose role is expressed via dependent partners.
Summing to true total predictive variance: Ensuring completeness of decomposition.
Providing valid statistical inference: By delivering asymptotic normality and explicit influence functions.
Drastic computational savings: Especially relevant for practical large-scale and high-dimensional deployments.

Table: Comparative features of DFI and prominent alternatives

Approach	Correlation handling	Statistical inference	Computational cost	Complete attribution
CPI, LOCO	✗ (distorted)	✓	High	✗
Marginal SHAP	✗ (distorted)	(Limited)	Very high	✗
DFI	✓ (disentangled)	✓	Low	✓

6. Applications and Broader Impact

Disentangled Feature Importance is broadly applicable in any context where high-dimensional or structurally dependent predictors are prominent:

Genomics and molecular biology: Revealing regulatory gene sets or variants of true importance, unmasking latent mechanisms despite pervasive linkage disequilibrium.
NLP and sequence modeling: Accurate attribution in the presence of n-gram/inter-token dependencies.
Causal inference: Properly quantifying covariate, confounder, or moderator effects without distortion from correlation or mediation.
Complex multimodal data: Suitability for non-Gaussian, heavy-tailed, or mixture-type feature spaces, given flexible transport map choices.

The DFI framework aligns with recent advances in explainable AI, seeking rigorous, robust, and statistically principled interpretability that can underpin scientific discovery, model auditing, and high-stakes decision-making.

7. Conclusion

Disentangled Feature Importance provides a general, nonparametric, and computationally efficient method for principled feature attribution in arbitrary dependency structures. By mapping features into independent latent coordinates and computing importance therein, DFI resolves limitations of conventional methods, offering consistent estimation, correct total attribution, and meaningful interpretability aligned with both statistical theory and practical demands. Its theoretical guarantees and practical performance support its use as a foundation for reliable model explanation, especially in contemporary data science applications characterized by complex, correlated predictors.

PDF Markdown Chat (Upgrade)