Systemic Variable Importance

Updated 17 December 2025

Systemic variable importance is a framework that aggregates importance scores across all near-optimal models, ensuring more stable and robust variable rankings.
Methodological implementations, such as Rashomon Importance Distributions and Variable Importance Clouds, use sampling and optimization techniques to assess variable contributions.
Applications in genomics, fairness audits, and scientific hypothesis testing reveal inter-feature trade-offs and support robust scientific inference.

Systemic variable importance refers to frameworks and methodologies that quantify the importance of input variables not for a single fitted predictive model, but across all models within a prescribed class that achieve near-optimal predictive performance. This concept addresses a fundamental limitation of classical variable importance methods: the potential instability and lack of generalizability of importance scores when relying exclusively on a single model. In data regimes where many models perform equally well—the Rashomon Effect—the systemic approach acknowledges the existence of multiple valid predictive explanations and aims to characterize the importance of each variable through the lens of the model class and the entire Rashomon set. This perspective leads to more robust, interpretable, and scientifically valid measures of variable contribution, particularly crucial in high-stakes or scientific contexts.

1. Definition and Theoretical Foundations

The systemic variable importance paradigm is formalized using the Rashomon set, denoted as $\mathcal{R}(\epsilon) = \{f \in \mathcal{F} : \ell(f, D) \leq \ell(f^*, D) + \epsilon\}$ , where $f^*$ is the empirical risk minimizer, $\mathcal{F}$ is a model class, and $\epsilon \geq 0$ quantifies an allowable tolerance from the best achievable loss (Donnelly et al., 2023). Systemic variable importance quantifies the importance of a variable $j$ aggregated across all $f \in \mathcal{R}(\epsilon)$ .

This stands in contrast to traditional single-model variable importance (e.g., permutation importance or coefficient magnitude), which evaluates importance only with respect to a chosen $f^*$ . In many practical scenarios, equally-good models may assign drastically different importances to the same variable; systemic methods reconcile these discrepancies by considering global properties over the entire Rashomon set (Dong et al., 2019, Fisher et al., 2018).

Formally, systemic variable importance for variable $j$ is defined as a functional of the distribution of importance scores (using a chosen metric, e.g., permutation or marginal contribution) as $f$ ranges over $\mathcal{R}(\epsilon)$ . This yields, for each variable, a full “Rashomon Importance Distribution” (RID) or “Variable Importance Cloud," whose summary statistics (e.g., mean, median, support, quantiles) can be used for scientific inference (Donnelly et al., 2023, Dong et al., 2019).

2. Methodological Implementations

Multiple approaches operationalize systemic variable importance in statistics and machine learning:

RID (Rashomon Importance Distribution): For a class and a variable-importance functional $\phi_j(f, D)$ (e.g., permutation importance), RID constructs the distribution $\{\phi_j(f, D): f \in \mathcal{R}(\epsilon)\}$ . This is estimated by (a) sampling or enumerating models in $\mathcal{R}(\epsilon)$ , (b) evaluating $\phi_j$ for each, and (c) summarizing the resulting distribution (Donnelly et al., 2023).
Variable Importance Clouds (VIC): The VIC is the set $\{\mathrm{MR}(f): f \in \mathcal{R}(\epsilon)\} \subset \mathbb{R}^p$ where $\mathrm{MR}(f)$ collects per-variable “model reliance” values (e.g., permutation or switch-based loss increases). The geometry of VICs reveals possible trade-offs in reliance across features: the boundary structure encodes whether two features can substitute for each other without degrading prediction (Dong et al., 2019).
Model Class Reliance (MCR): MCR characterizes the extremal values—minimum and maximum—of a given variable-importance measure as $f$ ranges over $\mathcal{R}(\epsilon)$ , providing interval-valued importance $[MCR_-(\epsilon), MCR_+(\epsilon)]$ (Fisher et al., 2018). MCR is computed via minimax optimization over the Rashomon set, with closed forms for certain loss and model classes.
UNIVERSE Bounds (with unobserved confounding): Extends the Rashomon set approach to settings with unobserved variables, giving robust bounds on systemic importance even under model misspecification and missing features (Donnelly et al., 14 Oct 2025).
Statistical Rigor: Systemic variable importance estimators admit probabilistic bounds and asymptotic consistency, often using U-statistics, empirical process theory, or finite-sample covering arguments (Donnelly et al., 2023, Fisher et al., 2018, Donnelly et al., 14 Oct 2025).

3. Comparison with Single-Model Importance

Stability and Interpretability: Classical importance metrics—e.g., permutation importance, SHAP values, or coefficient sizes for a single $f^*$ —are inherently conditional on the choice of $f^*$ . Systemic (Rashomon-based) approaches reveal the extent to which a variable’s role is robust to model selection and regularization (Donnelly et al., 2023, Dong et al., 2019).

If a variable’s importance varies widely across $\mathcal{R}(\epsilon)$ , any claim of its necessity or “root cause” status is scientifically weak (“non-identifiability”).
If a variable’s importance is consistently high (narrow, high RID) across all near-optimal models, one gains strong evidential support for its centrality.

These systemic methods resolve scientific disputes by exposing the conditionality and ambiguity of single-model explanations, and by granting a platform for sensitivity analysis under both observed and unobserved uncertainties (Donnelly et al., 14 Oct 2025).

Inter-feature Trade-offs: Systemic methods can reveal complementary or substitutable relationships among predictors. VICs and RIDs highlight when reduction in reliance on one variable must be compensated by increased reliance on another, characterizing equivalence classes of predictive decomposability (Dong et al., 2019).

4. Algorithms and Computational Aspects

Common algorithmic elements include:

Enumeration or Sampling: Efficient traversal of $\mathcal{R}(\epsilon)$ is tractable for convex model classes (e.g., linear, ridge, logistic), using ellipsoidal approximations, hit-and-run sampling, or greedy search for trees (Donnelly et al., 2023, Dong et al., 2019).
Variable-importance Functional Evaluation: Any functional (permutation, switch, marginal, conditional) can be used, provided it is meaningful throughout $\mathcal{F}$ .
Optimization over Rashomon sets: For MCR, Dinkelbach-style or one-dimensional searches over $γ$ are used to minimize/maximize MR over all $f$ obeying risk constraints (Fisher et al., 2018).
Uncertainty Quantification: Recent work provides finite-sample/outer bounds on RIDs, RID summaries, and MCRs, with explicit quantification of estimation error and, in the presence of unobserved confounding, “robustified” intervals for systemic importance (Donnelly et al., 14 Oct 2025).

Practically, complexity hinges on the Rashomon set’s size and the model class; for small/interpretable models computation is feasible, but for high-dimensional nonconvex classes (deep nets, large trees) specialized approximate schemes are required (Donnelly et al., 2023, Donnelly et al., 14 Oct 2025).

5. Practical Applications and Empirical Results

Systemic variable importance has been deployed in high-dimensional genomics, public policy, fairness audits, and scientific explorations:

Genomic studies: In HIV gene-expression analysis, RID identified important genes for predicting viral load, including a previously unexplored gene, and provided stable estimates that single-model approaches missed (Donnelly et al., 2023).
Fairness/Discrimination Audits: By calculating the range of importance possible for protected attributes (e.g., race, gender) across $\mathcal{R}(\epsilon)$ , practitioners can rigorously interrogate claims about algorithmic discrimination—even for proprietary or black-box models (Fisher et al., 2018, Donnelly et al., 14 Oct 2025).
Scientific Hypothesis Testing: VICs and RIDs reveal features whose necessity is not identifiable from the available data and model class, highlighting unresolvable scientific ambiguity unless additional design interventions (e.g., new experimental features) are performed (Dong et al., 2019).
Model Robustness: In adversarial or misspecified settings, systemic methods quantify the sensitivity of importance scores to missing features, regularization, and modeling choices (Donnelly et al., 14 Oct 2025).

Empirical studies demonstrate that systemic approaches recover true importance structure in simulations with feature equivalence classes, interactions, and omitted variables—situations where single-model importances fail (Donnelly et al., 2023, Donnelly et al., 14 Oct 2025, Dong et al., 2019).

6. Limitations, Interpretative Guidelines, and Outlook

Systemic variable importance requires the specification of a model class $\mathcal{F}$ and a risk or accuracy tolerance $\epsilon$ . The width of the RID or MCR intervals is sensitive to both the inductive bias coded in $\mathcal{F}$ and the data’s information content. If $\epsilon$ is too large or $\mathcal{F}$ too broad, Rashomon sets can be so wide that no stable conclusions about variable necessity are possible (“non-identifiability”). Conversely, restricting $\mathcal{F}$ or underestimating uncertainty can yield unwarranted overconfidence in attributed importance.

Statistically valid bounds on systemic importance (e.g., distributional intervals, coverage under missing features or measurement error) depend on nontrivial assumptions and exact computation can be computationally infeasible for large, nonconvex model classes. In such cases, the reported systemic importance should be interpreted as a function of these user-imposed constraints (Donnelly et al., 14 Oct 2025).

Systemic approaches do not easily decompose the contributions of collinear variables: when multiple highly correlated features can replace each other in the Rashomon set, their individual systemic importances may become indeterminate, though group- or pathway-level importances remain interpretable (Donnelly et al., 2023, Dong et al., 2019).

Overall, systemic variable importance supplies a rigorous, model-class-wide framework for stable and scientifically robust variable ranking and hypothesis generation, especially in settings characterized by model multiplicity, correlation, and potential hidden confounding (Donnelly et al., 2023, Dong et al., 2019, Fisher et al., 2018, Donnelly et al., 14 Oct 2025).

References:

"The Rashomon Importance Distribution: Getting RID of Unstable, Single Model-based Variable Importance" (Donnelly et al., 2023)
"Variable Importance Clouds: A Way to Explore Variable Importance for the Set of Good Models" (Dong et al., 2019)
"All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously" (Fisher et al., 2018)
"Doctor Rashomon and the UNIVERSE of Madness: Variable Importance with Unobserved Confounding and the Rashomon Effect" (Donnelly et al., 14 Oct 2025)