Measure of Feature Importance (MFI)

Updated 20 April 2026

MFI is a quantitative method that assesses feature impact by adjusting surrogate split agreements using null-permutation to remove inherent biases.
It integrates individual and joint effects through Mutual Impurity Reduction (MIR), enhancing robustness in high-dimensional, correlated settings.
Empirical results demonstrate that MFI/MIR detect complex interactions and outperform traditional methods like Gini/MDI and permutation importance.

A Measure of Feature Importance (MFI) quantifies the relevance or impact of input variables on either the prediction or the internal operation of a statistical or machine learning model. MFIs can be purely data-driven, model-based, or causal, and may aim to capture both individual and joint (interaction) effects, correct for statistical artifacts, and provide both statistical significance and interpretability of the results. In the high-dimensional and correlation-rich settings typical of modern machine learning—such as random forests, ensemble models, and non-linear predictors—the choice of an appropriate feature-importance measure is crucial for robust model interpretation, reliable variable selection, and discovering complex dependencies among predictors and outcomes.

1. Mathematical Foundations of Mutual Forest Impact (MFI) and Mutual Impurity Reduction (MIR)

The Mutual Forest Impact (MFI), introduced by Voges et al. (Voges et al., 2023), formalizes the mutual contribution of feature pairs to the predictive partitioning induced by random forests. The MFI quantifies how often a candidate feature $X_j$ serves as a surrogate for a primary splitting feature $X_i$ , corrected for artifacts such as high category cardinality or minor allele frequency (MAF) by subtracting agreement rates observed on null-permuted data.

Formally, for feature matrix $X = (X_1, ..., X_p)$ and its column-permuted copy $Z = (Z_1, ..., Z_p)$ , after growing random forests (RF) with $s$ surrogate splits recorded per node, the key steps are:

For each tree node $n$ split on $X_i$ , compute the best surrogate split $q^n$ on $X_j$ and their adjusted agreement:

$\text{adj}(p_n^{X_i}, q_n^{X_j}) = \frac{n_\text{surr} - n_\text{maj}}{n_\text{total} - n_\text{maj}}$

where $X_i$ 0 is the number of samples at node $X_i$ 1, and $X_i$ 2 is the size of the largest child.

Aggregate the mean adjusted agreement over all nodes with primary split $X_i$ 3:

$X_i$ 4

with a corresponding null version $X_i$ 5 computed on the permuted data.

The Mutual Forest Impact is then:

$X_i$ 6

ensuring that under the null hypothesis (no association), MFI is centered at zero.

The Mutual Impurity Reduction (MIR) combines individual feature importance (actual impurity reduction, AIR) with mutual impacts: $X_i$ 7 This aggregates not only the individual effect of $X_i$ 8 but also its mutual reinforcement with other important features, as quantified by their mutual impact.

2. Algorithmic Workflow and Statistical Testing of MFI

The practical computation of MFI in random forests follows a structured pipeline:

Generate null-permuted data $X_i$ 9 by permuting each column of $X = (X_1, ..., X_p)$ 0.
For the original data $X = (X_1, ..., X_p)$ 1 and for $X = (X_1, ..., X_p)$ 2, train random forests recording, at every node, the surrogate split agreements.
Accumulate, for each ordered feature pair $X = (X_1, ..., X_p)$ 3, the total adjusted agreements and the count of relevant nodes.
Compute empirical mean adjusted agreements for all feature pairs on both $X = (X_1, ..., X_p)$ 4 and $X = (X_1, ..., X_p)$ 5.
Form the MFI matrix by differencing the original and null agreements elementwise.

Statistical significance is assessed via two complementary approaches:

Mirror approach: Reflect all negative MFI values to generate a null distribution and compute one-sided $X = (X_1, ..., X_p)$ 6-values for positive empirical MFIs.
Permutation-based null: For MIR, simulate the null distribution by randomly pairing surrogate agreements and AIR values from the permuted data.

The null hypothesis tested is $X = (X_1, ..., X_p)$ 7 for both MFI and MIR, yielding a $X = (X_1, ..., X_p)$ 8-value for the observed statistic.

3. Theoretical Motivation and Advantages over Classical Feature Importances

MFI is motivated by the recognition that tree-based variable importance measures (e.g. Mean Decrease in Impurity, permutation importance) exhibit systematic biases—favoring high-cardinality features, features with many possible splits, or high minor allele frequency variants—regardless of their true predictive merit.

Key properties of the MFI/MIR framework:

Correction for split-point bias: By differencing with null-permuted data, MFI removes artifacts associated with variable cardinality or prevalence.
Explicit modeling of mutual predictive power: Surrogate splits beget mutual importances not captured by marginal or correlation-based statistics.
Tree-adaptive and robust to feature–feature correlations: MFI distinguishes features that are mutually informative versus redundant, a scenario where permutation importance may underestimate true joint relevance.

These properties make MFI and MIR especially robust in high-dimensional or collinear settings, as demonstrated by empirical results on simulated and gene-expression datasets (Voges et al., 2023).

4. Empirical Performance and Comparison to Standard Importances

Empirical evaluation in (Voges et al., 2023) reveals the following:

Bias Correction: In null scenarios, unadjusted mean surrogate agreement is strongly biased, whereas MFI (and MIR) are centered at zero for all feature pairs, irrespective of category count or MAF.
Detection Power in Correlated Data: In datasets with causal and highly correlated variant features, MIR consistently achieves over 80% power to detect true causals; individual AIR or minimal-depth surrogacy perform less robustly.
Gene Expression Applications: On datasets with complex correlation such as simulated gene expression data, MIR outperforms or matches AIR and surrogate minimal depth in both stability and resistance to false-positives, particularly for small-effect features.

When comparing to Gini/MDI, permutation, and surrogate depth measures, MFI/MIR have:

Property	Gini/MDI	Perm. Imp.	MFI/MIR
Cardinality bias	High	None	None
Sensitive to feature correlation	High	High	None/Low
Mutual/adaptive interactions	No	No	Yes
Grouping of impact profiles	No	No	Yes

This demonstrates the ability of MFI/MIR to robustly uncover both individual and combinatorial (synergistic or redundant) feature relationships.

5. Relation to Broader MFI Frameworks and Variants

The field encompasses a variety of MFI and related feature importance constructs, each with distinct formal properties and statistical motivations:

Model-agnostic permutation and ablation FIs (Merrick, 2019): Define importance as the risk difference after randomly permuting or masking a feature, directly connecting to Breiman's permutation importance.
Unbiased split-improvement measures (Zhou et al., 2019, Li et al., 2019): Correct the inherent split-selection bias in trees by leveraging out-of-bag or test-set impurity reductions.
Coefficient of Variation-based MFIs (Fang et al., 2020): Quantify per-feature importance as the relative dispersion of per-sample contributions to predictions.
Kernel and joint Shapley-type MFIs (Harris et al., 2021): Generalize classical Shapley allocation to coalitions, enabling attribution of joint and interactive feature effects.
Causal feature importance (PN-FI, PS-FI, PNS-FI) (Du et al., 2023): Quantify necessity/sufficiency-based importance intervals using counterfactual probabilities.

MFI and MIR, as developed in (Voges et al., 2023), are distinct in their adaptation to the surrogate splitting structure of random forests and in their explicit null-corrected, pairwise design for identifying interaction-driven predictive information.

6. Limitations, Considerations, and Implementation Remarks

Surrogate Parameter Sensitivity: The precision of MFI depends on the number of surrogate splits (parameter $X = (X_1, ..., X_p)$ 9); SMD measures, by contrast, exhibit sensitivity to $Z = (Z_1, ..., Z_p)$ 0 and thus less stability in practice.
Computational Requirements: The joint surrogate analysis and pseudo-data subtraction are computationally intensive, particularly in large- $Z = (Z_1, ..., Z_p)$ 1 settings, though parallelization is possible.
Interpretability and Grouping: MFI/MIR matrices naturally lend themselves to the identification of feature clusters with similar predictive roles, facilitating interpretable model audit beyond scalar importance scores.

In applications where understanding complex multivariate dependencies is critical (e.g., genomics, high-dimensional imaging, correlated omics data), the mutual forest impact paradigm offers a statistically principled path to both detection and characterization of relevant variables and their relations (Voges et al., 2023).