Causal-Graph-Based Stable Feature Selection

Updated 22 April 2026

Causal-graph-based feature selection is a method that uses structural causal models to identify feature subsets whose predictive power remains invariant across different conditions.
It employs techniques such as invariant causal prediction, causal GNNs, and subsampling to differentiate between true causation and spurious correlations.
Applications in biomarker discovery, time-series forecasting, and domain adaptation demonstrate its capacity for enhancing reproducibility and generalization.

Causal-graph-based selection of stable features refers to a family of methodologies that leverage structural causal models and graph-theoretic frameworks to identify feature subsets whose predictive relationship with a target variable remains invariant across a range of environments, interventions, or data splits. Unlike classical statistical or purely correlational feature selection, which may conflate causal and spurious dependencies and suffer from instability under distribution shift, these approaches resolve spurious correlations by explicitly modeling underlying causal mechanisms and enforcing invariance or stability criteria. This paradigm is now central in domains such as biomarker discovery, time-series forecasting, process monitoring, graph learning, and robust machine learning under covariate, mechanism, or domain shifts.

1. Structural Causal Graphs and the Notion of Stability

A structural causal graph encodes the dependencies among features (variables) via directed (and undirected, in some models) edges. Each node represents a variable $X_j$ and directed edges $X_i \to X_j$ encode direct causal influence. The structural equations define $X_j$ as a function of its parents and noise variables.

Stability, in this context, is defined formally via the conditional invariance of the predictive relationship between a subset of features $S$ and a target $Y$ across a family of data-generating regimes or environments (e.g., different interventions, time periods, or domain shifts):

$\forall e_1, e_2 \in \mathcal{E}: \quad P^{e_1}(Y \mid X_S) = P^{e_2}(Y \mid X_S)$

A feature subset qualifies as stable if, conditioned on that subset, the conditional distribution of $Y$ remains invariant to the underlying shifts in the data—i.e., the conditional is not affected by unobserved interventions or context changes (Pfister et al., 2019, Gamella et al., 2020).

Graph-based approaches typically connect this stability notion to causal sufficiency—e.g., the stable blanket or causal Markov boundary is defined as the smallest subset of variables that (i) $d$ -separates $Y$ from all sources of environmental variation/intervention in the causal DAG, (ii) is minimal with respect to identifiability, and (iii) is predictive-optimal with respect to the post-intervention or out-of-distribution setting (Pfister et al., 2019, Triantafillou et al., 2021).

2. Methodologies for Causal-Graph-Based Stable Feature Selection

The diverse methodologies under this paradigm share core components: construction or utilization of a causal/interaction graph, estimation of causal effects or invariance, and a selection procedure that operationalizes stability. Key methodological frameworks include:

A. Structural Causal Model (SCM)-Driven Search:

Direct parent selection is achieved via orthogonal score-based regressions, estimating for each $j$ the "causal covariance" $X_i \to X_j$ 0 between $X_i \to X_j$ 1 and $X_i \to X_j$ 2 orthogonalized with respect to $X_i \to X_j$ 3, and selecting all $X_i \to X_j$ 4 for which this is nonzero (Soleymani et al., 2020). This approach is consistent under nonlinearity and cycles and robust to high-dimensional confounding.

B. Invariant Causal Prediction (ICP) and Extensions:

ICP tests conditional invariance of candidate feature subsets across environments—rejecting subsets where residuals of regressing $X_i \to X_j$ 5 on $X_i \to X_j$ 6 show a change in distribution across environments—and outputs the intersection of all subsets passing a statistical invariance test (Gamella et al., 2020). Active ICP sequentially selects interventions to accelerate identification of the direct causes.

C. Markov Boundary and Causal Markov Boundary Search:

A subset $X_i \to X_j$ 7 is selected via graphical criteria ensuring identifiability (back-door or $X_i \to X_j$ 8-separation), minimality, and maximal informativeness for the post-intervention distribution (Triantafillou et al., 2021). Bayesian or constraint-based algorithms search over Markov boundaries and combine observational and limited interventional data for robust identification.

D. Causal Graph Neural Networks and Graph Partitioning:

In complex data, such as transcriptomics or graphs, a causal interaction network is used (often curated or estimated) as a basis for GNN message passing. Gene (feature) effects are then estimated via propensity-score–adjusted regression, with stability measured by multi-fold overlap of selected features (Lan et al., 17 Nov 2025). In unsupervised or domain adaptation contexts, representations are explicitly disentangled into causal and spurious components, using information bottleneck and mutual information constraints to isolate stable features (Luo et al., 10 Jul 2025, Sui et al., 2021).

E. Stability Selection via Subsampling, Bagging, or Random Environments:

Multi-objective causal structure search is performed in repeated subsamples or environments. Features are identified as stable if they appear frequently across the Pareto-optimal models or across random perturbations of the data or confounding set (Rahmadi et al., 2016, Chen et al., 2022). The core criterion is that the coefficient or causal effect of a variable remains invariant to the choice of subsample, background adjustment, or perturbation.

F. Causal and Stability-Aware Diffusion and Bayesian Methods:

Feature selection is cast as approximate posterior inference over feature subsets, with a learned diffusion prior over masks and a likelihood rewarding low error and low cross-environment variance. Sampling (e.g., via Langevin dynamics) concentrates on masks yielding both empirical accuracy and selection robustness (Malarkkan et al., 21 Mar 2026).

3. Algorithms and Mathematical Criteria

Below is an indicative table connecting methodologies to their core selection criteria and algorithmic strategies:

Approach	Stability/Selection Criterion	Core Algorithmic Step
SCM/Orthogonal Search (Soleymani et al., 2020)	$X_i \to X_j$ 9 (causal covariance), Neyman-orthogonal score	Cross-fitted, debiased regression
ICP / Active ICP (Gamella et al., 2020)	Conditional invariance $X_j$ 0 across environments	Intersection of non-rejected subsets
Markov Boundary (Triantafillou et al., 2021)	Minimal valid back-door/CMB in causal or mutilated graph	Subset search over $X_j$ 1-separation sets
Subsampling/Bagging (Rahmadi et al., 2016)	High frequency/selection rate across random subsamples	Pareto optimization + stability threshold
Random Backgrounds (Chen et al., 2022)	Low coefficient variability $X_j$ 2 across random confounder subsets	Monte-Carlo coefficient stability
Causal GNN (Lan et al., 17 Nov 2025)	Low average causal effect ( $X_j$ 3), high overlap under resampling	GCN-based score + multirun consensus
Causal Diffusion (Malarkkan et al., 21 Mar 2026)	Low OOD variance + accuracy in sampled posterior masks	Diffusion prior + guided sampling

All approaches, despite differences, share the objective of enforcing invariance or stability in the selected feature set, either via graph-theoretic, statistical, or Bayesian principles.

4. Empirical Evaluation and Application Domains

Causal-graph-based stable feature selection has demonstrated superiority to traditional (correlational) methods in a range of empirical settings:

Biomedical and Multi-omics: Causal-GNN reduces the number of biomarkers in transcriptomic prediction (e.g., NSCLC: 37 genes vs 46 for causal inference baseline), improves F1 (from 0.856 to 0.915), and yields feature sets with high fold-to-fold overlap, ensuring reproducibility and biological interpretability (Lan et al., 17 Nov 2025, Pfister et al., 2019).
Environmental Science: In cyclone forecasting, multidata PC/PCMCI methods yield sparse, interpretable feature sets that generalize out-of-sample (e.g., $X_j$ 4 with only 17–31 features) (S. et al., 2023).
Unsupervised and Graph Learning: In clustering/image datasets, causally-regularized feature selection leads to improvements in accuracy and NMI compared to all baselines, and feature visualizations show interpretable, structure-aware selections (Shen et al., 2024).
Time-series and Process Monitoring: Time-delayed cross-mapping generates a lagged, directed causal graph whose edges are pruned at a validation-optimal threshold, resulting in robust and stable soft sensor models (e.g., RMSE reductions of $X_j$ 5 compared to best baseline methods) (Chen et al., 20 Jan 2026).
Longitudinal and SEM-based Discovery: Stability selection under multi-objective (fit, complexity) search reveals robust, low-complexity substructures in longitudinal biomedical data, validated against prior knowledge and yielding novel causal hypotheses (Rahmadi et al., 2016).

5. Theoretical Guarantees and Limitations

Causal-graph-based selection enjoys theoretical backing in terms of identifiability, statistical error control, and sample complexity.

Error Control: Familywise error is controlled in invariance-based frameworks (e.g., at level $X_j$ 6 in ICP) (Gamella et al., 2020).
Consistency: Orthogonal score algorithms yield root-N consistent identification of direct parents under approximate sparsity; Markov boundary approaches are consistent and achieve low bias under combined observational/experimental data (Soleymani et al., 2020, Triantafillou et al., 2021).
Generalization: The stable blanket, Markov boundary, or ACE-ranked subset is proven optimal for generalization to new interventions/environment (Pfister et al., 2019).
Failure Modes: All methods hinge on the sufficiency and correct specification of the causal graph or the stability-ensuring assumptions across environments. Violations—unmodeled confounding, unblocked back-door paths, or erroneous graph structure—can lead to bias or instability.

A practical implication is that the reliability of the selected features depends on fidelity of the causal skeleton, the adequacy of environment/intervention diversity, and statistical power.

6. Practical Workflow and Stability Assessment in Causal-Graph-Based Selection

Most practical algorithms follow a workflow of:

Graph Construction or Integration: Use curated networks, causal discovery, or experiments to specify the skeleton.
Causal/Invariant Effect Quantification: Estimate direct effects, ACE, or perform invariance testing on candidate sets.
Feature Ranking and Thresholding: Sort according to effect size or stability; select a top- $X_j$ 7, or threshold via validation set error.
Stability or Overlap Measurement: Validate stability via resampling splits, overlap counts, or variability statistics (such as coefficient variance $X_j$ 8) (Lan et al., 17 Nov 2025, Chen et al., 2022). For instance, measuring pairwise/top- $X_j$ 9 feature overlap across folds $S$ 0:

$S$ 1

Aggregating over subsets of folds quantifies stability.

Interpretation and Biological/Predictive Validation: Use enrichment analyses or cross-environment testing to validate the causal and predictive status of selected features.

This structure is domain-agnostic, though specific implementations are tailored to graphs, time series, or tabular data.

7. Extensions, Open Challenges, and Research Directions

Recent developments have extended these frameworks in several critical directions:

Unsupervised and Self-supervised Causal Selection: Methods that identify stable features without labels, via spectral regression with causal regularizers or disentanglement of causal and non-causal parts in the latent graph (Shen et al., 2024, Luo et al., 10 Jul 2025).
Domain/generalization and Adaptation: Causal-graph identification and selection integrated into neural architectures (SLOGAN, CAL, CARNAS) enforce invariance even under domain shift and unsupervised adaptation (Luo et al., 10 Jul 2025, Sui et al., 2021, Li et al., 2024).
Probabilistic and Bayesian Feature Selection: Diffusion-based or posterior sampling approaches capture selection uncertainty and structural dependencies, yielding robust feature sets under explicit cross-environment stability constraints (Malarkkan et al., 21 Mar 2026).
Quantitative Causal Validation: Empirical overlap, motif enrichment, and cross-environment error analysis provide operational measures of stability and causal relevance.

Open problems include relaxations to unknown or partially observed graphs, high-dimensional environments, arbitrary interventions, and integration of richer environmental/contextual metadata. Robustness to violations of faithfulness, hidden confounding, or model mis-specification remains an active topic.

In summary, causal-graph-based selection of stable features unifies causal modeling principles with statistical invariance and stability constraints to yield robust, interpretable, and generalizable feature sets. This approach outperforms traditional methods in reproducibility, prediction under shift, and mechanistic interpretability, with broad applicability across machine learning and scientific inference (Lan et al., 17 Nov 2025, Gamella et al., 2020, Pfister et al., 2019, Triantafillou et al., 2021, Shen et al., 2024, Soleymani et al., 2020).