Causal Machine Learning Methods

Updated 24 January 2026

Causal Machine Learning Methods are data-driven techniques that estimate treatment effects and support causal discovery using machine learning models.
They integrate frameworks, such as potential outcomes and structural causal models, with advanced methods like meta-learners, causal forests, and doubly robust estimators.
These methods are widely used in fields like biomedical research, providing robust diagnostics, bias correction, and validation for observational studies.

Causal machine learning (CML) encompasses a collection of data-driven methodologies for mapping the effects of interventions, treatments, or exposures on outcomes, especially under observational or non-randomized conditions. Central to its framework are tools and algorithms that generalize beyond traditional statistical inference, providing individualized treatment effect estimation and supporting personalized prediction. CML integrates advanced machine learning—such as neural networks, forests, and ensembles—into the causal inference pipeline both for effect estimation and causal-discovery tasks. This article surveys foundational models, computational strategies, validation paradigms, and practical pipelines, emphasizing both theoretical and empirical aspects in biomedical and other domains (Feuerriegel et al., 2024).

1. Foundational Models and Identification Strategies

CML methods formalize the data-generating process via two main frameworks:

Potential Outcomes (Rubin–Neyman Framework): For each unit $i$ , two potential outcomes %%%%1%%%% and $Y_i(0)$ are envisioned, corresponding to the outcomes under treatment and control, respectively. The factual outcome is $Y_i = A_i Y_i(1)+(1−A_i) Y_i(0)$ , where $A_i\in\{0,1\}$ is the treatment indicator.
Structural Causal Models (SCMs): Outcomes and covariates are modeled as nodes in a Directed Acyclic Graph (DAG), with edges encoding direct functional (causal) dependence and conditional independence structure. The post-intervention law for an outcome is denoted $P(Y|\mathrm{do}(A=a))$ .

Key estimands include:

Average Treatment Effect (ATE): $\taū = E[Y(1)−Y(0)]$.
Conditional Average Treatment Effect (CATE): $\tau(x) = E[Y(1)−Y(0) | X=x]$ .

Identification of these quantities from observed data requires strong assumptions:

Unconfoundedness (ignorability): $A\perp (Y(1),Y(0)) | X$ .
Positivity (overlap): $0 < P(A=1|X=x) < 1$ for all $x$ in support.
SUTVA (no interference/consistency): Each $Y_i(a)$ depends only on $i$ ’s treatment.

2. Estimation Algorithms: Meta-learners and Model-Specific Methods

CML operationalizes effect estimation through both general-purpose wrappers and adapted machine learning algorithms:

Plug-in Estimators: Learn two outcome regression functions $\hat m_1(x)$ and $\hat m_0(x)$ ; estimate CATE as $\hat \tau_{\mathrm{PL}}(x) = \hat m_1(x) - \hat m_0(x)$ .
Inverse-Probability Weighting (IPW): With estimated propensity score $\hat e(x)=P(A=1|X=x)$ ,

$\hat \tau_{\mathrm{IPW}} = \frac{1}{n} \sum_{i=1}^n \Big(A_i Y_i / \hat e(X_i) - (1-A_i) Y_i / [1 - \hat e(X_i)]\Big)$

Doubly Robust (DR) Estimator:

$\hat \tau_{\mathrm{DR}} = \frac{1}{n} \sum_{i=1}^n \Big[\hat m_1(X_i) - \hat m_0(X_i) + \frac{A_i (Y_i-\hat m_1(X_i))}{\hat e(X_i)} - \frac{(1-A_i)(Y_i-\hat m_0(X_i))}{1-\hat e(X_i)}\Big]$

This estimator is consistent if either the outcome or propensity score model is correct.

Meta-Learners ([Künzel et al. 2019]):
- S-learner: single model $f(X,A)$ , $\hat \tau_{\mathrm{S}}(x) = f(x,1) - f(x,0)$ .
- T-learner: fit $m_1(x), m_0(x)$ separately on treated/controls, then $\hat \tau_{\mathrm{T}}(x) = m_1(x) - m_0(x)$ .
- X-learner: impute residuals, re-weight, and use a second-stage model for effect estimation—improves robustness to group size imbalance.
- DR-learner: regress pseudo-outcome $R_i$ on $X$ to estimate CATE.
Model-Specific Methods:
- Causal forests and trees ([Athey & Imbens 2016; Wager & Athey 2018]): partition $X$ to maximize heterogeneity, aggregate estimates for valid inference.
- Representation-learning networks (TARNet, CFR, Dragonnet): balance treatment/control representations for robust effect estimation.
- Dose-response nets: extend causal ML to continuous treatments using dose discretization or adversarial balancing.
Orthogonalization and Double Machine Learning (DML) ([Chernozhukov et al. 2018]): Construct a moment function $\psi(W_i; \theta, \eta)$ with Neyman-orthogonality to nuisance parameters $\eta$ (regression and propensity functions). Use cross-fitting to sample-split and aggregate fold-specific $\theta$ solutions, yielding $\sqrt{n}$ -consistent estimates with high-dimensional ML back-ends.

3. Causal Discovery and Graph Structure Learning

CML incorporates causal discovery using machine learning in addition to effect estimation.

Constraint-Based Algorithms (PC/FCI): Sequentially test for conditional independence, orient edges using v-structures and orientation rules, accounting for latent confounding in FCI.
Score-Based Algorithms (GES, NOTEARS): Search for DAG maximizing likelihood penalized by complexity, with acyclicity constraints enforced via continuous optimization.
Supervised Learning Approaches (SLdisco (Petersen et al., 2022)): Learn direct mapping from observed correlation matrices to CPDAG adjacency using convolutional neural networks, with conservativeness and sample-size robustness. SLdisco demonstrates increased negative predictive value (NPV) and orientation G1 under dense and small-sample regimes compared to PC/GES.

Algorithmic advances address key limitations of traditional methods: excessive sparsity, high false-negative rates for missing causal links, and error propagation under limited sample sizes.

4. Bias Correction, Robustness, and Validity Assessment

Robust effect estimation in CML requires systematic diagnostics and sensitivity checks for the violation of identification assumptions:

Diagnostics:
- Placebo variable inclusion: check whether $\hat \tau$ on random covariate is $\approx 0$ .
- Coin-flip tests: permutation of

Markdown Upgrade to Chat

References (2)

Causal machine learning for predicting treatment outcomes (2024)

Causal discovery for observational sciences using supervised machine learning (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Causal Machine Learning Methods.

Causal Machine Learning Methods

1. Foundational Models and Identification Strategies

2. Estimation Algorithms: Meta-learners and Model-Specific Methods

3. Causal Discovery and Graph Structure Learning

4. Bias Correction, Robustness, and Validity Assessment

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Causal Machine Learning Methods

1. Foundational Models and Identification Strategies

2. Estimation Algorithms: Meta-learners and Model-Specific Methods

3. Causal Discovery and Graph Structure Learning

4. Bias Correction, Robustness, and Validity Assessment

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research