Causal Machine Learning Methods
- Causal Machine Learning Methods are data-driven techniques that estimate treatment effects and support causal discovery using machine learning models.
- They integrate frameworks, such as potential outcomes and structural causal models, with advanced methods like meta-learners, causal forests, and doubly robust estimators.
- These methods are widely used in fields like biomedical research, providing robust diagnostics, bias correction, and validation for observational studies.
Causal machine learning (CML) encompasses a collection of data-driven methodologies for mapping the effects of interventions, treatments, or exposures on outcomes, especially under observational or non-randomized conditions. Central to its framework are tools and algorithms that generalize beyond traditional statistical inference, providing individualized treatment effect estimation and supporting personalized prediction. CML integrates advanced machine learning—such as neural networks, forests, and ensembles—into the causal inference pipeline both for effect estimation and causal-discovery tasks. This article surveys foundational models, computational strategies, validation paradigms, and practical pipelines, emphasizing both theoretical and empirical aspects in biomedical and other domains (Feuerriegel et al., 2024).
1. Foundational Models and Identification Strategies
CML methods formalize the data-generating process via two main frameworks:
- Potential Outcomes (Rubin–Neyman Framework): For each unit , two potential outcomes %%%%1%%%% and are envisioned, corresponding to the outcomes under treatment and control, respectively. The factual outcome is , where is the treatment indicator.
- Structural Causal Models (SCMs): Outcomes and covariates are modeled as nodes in a Directed Acyclic Graph (DAG), with edges encoding direct functional (causal) dependence and conditional independence structure. The post-intervention law for an outcome is denoted .
Key estimands include:
- Average Treatment Effect (ATE): $\taū = E[Y(1)−Y(0)]$.
- Conditional Average Treatment Effect (CATE): .
Identification of these quantities from observed data requires strong assumptions:
- Unconfoundedness (ignorability): .
- Positivity (overlap): $0 < P(A=1|X=x) < 1$ for all in support.
- SUTVA (no interference/consistency): Each depends only on ’s treatment.
2. Estimation Algorithms: Meta-learners and Model-Specific Methods
CML operationalizes effect estimation through both general-purpose wrappers and adapted machine learning algorithms:
- Plug-in Estimators: Learn two outcome regression functions and ; estimate CATE as .
- Inverse-Probability Weighting (IPW): With estimated propensity score ,
- Doubly Robust (DR) Estimator:
This estimator is consistent if either the outcome or propensity score model is correct.
- Meta-Learners ([Künzel et al. 2019]):
- S-learner: single model , .
- T-learner: fit separately on treated/controls, then .
- X-learner: impute residuals, re-weight, and use a second-stage model for effect estimation—improves robustness to group size imbalance.
- DR-learner: regress pseudo-outcome on to estimate CATE.
- Model-Specific Methods:
- Causal forests and trees ([Athey & Imbens 2016; Wager & Athey 2018]): partition to maximize heterogeneity, aggregate estimates for valid inference.
- Representation-learning networks (TARNet, CFR, Dragonnet): balance treatment/control representations for robust effect estimation.
- Dose-response nets: extend causal ML to continuous treatments using dose discretization or adversarial balancing.
- Orthogonalization and Double Machine Learning (DML) ([Chernozhukov et al. 2018]): Construct a moment function with Neyman-orthogonality to nuisance parameters (regression and propensity functions). Use cross-fitting to sample-split and aggregate fold-specific solutions, yielding -consistent estimates with high-dimensional ML back-ends.
3. Causal Discovery and Graph Structure Learning
CML incorporates causal discovery using machine learning in addition to effect estimation.
- Constraint-Based Algorithms (PC/FCI): Sequentially test for conditional independence, orient edges using v-structures and orientation rules, accounting for latent confounding in FCI.
- Score-Based Algorithms (GES, NOTEARS): Search for DAG maximizing likelihood penalized by complexity, with acyclicity constraints enforced via continuous optimization.
- Supervised Learning Approaches (SLdisco (Petersen et al., 2022)): Learn direct mapping from observed correlation matrices to CPDAG adjacency using convolutional neural networks, with conservativeness and sample-size robustness. SLdisco demonstrates increased negative predictive value (NPV) and orientation G1 under dense and small-sample regimes compared to PC/GES.
Algorithmic advances address key limitations of traditional methods: excessive sparsity, high false-negative rates for missing causal links, and error propagation under limited sample sizes.
4. Bias Correction, Robustness, and Validity Assessment
Robust effect estimation in CML requires systematic diagnostics and sensitivity checks for the violation of identification assumptions:
- Diagnostics:
- Placebo variable inclusion: check whether on random covariate is .
- Coin-flip tests: permutation of