Causal Mediation Analysis

Updated 2 July 2025

Causal Mediation Analysis is a method that decomposes total effects into direct and indirect pathways, clarifying how treatments influence outcomes.
It employs nonparametric moment balancing to avoid model misspecification and achieve global semiparametric efficiency.
The approach extends to multiple mediators while offering consistent variance estimation, ensuring robust and reliable causal insights.

Causal mediation analysis (CMA) is a central tool in scientific research for understanding how an intervention or treatment exerts its effects by decomposing the total effect into direct and indirect (mediated) pathways. Classical parametric approaches require researchers to specify and fit a sequence of regression models for outcome, mediator, treatment, and confounders, making inference sensitive to model misspecification and potentially leading to bias or inefficiency. “Efficient nonparametric estimation of causal mediation effects” introduces a fully nonparametric estimator that leverages empirical moment balancing, avoiding parametric modeling assumptions and attaining the global semiparametric efficiency bound.

1. Foundations and Decomposition of Mediation Effects

The potential outcomes framework underlies modern CMA. For a binary treatment $T$ , mediator $M$ , and outcome $Y$ , the total causal effect is split as follows: $\E[Y(1,M(1)) - Y(0, M(0))] = \underbrace{\E[Y(1,M(1)) - Y(1, M(0))]}_{\text{Natural Indirect Effect (NIE)}} + \underbrace{\E[Y(1,M(0)) - Y(0, M(0))]}_{\text{Natural Direct Effect (NDE)}}$ Existing methods generally estimate these effects by fitting models for the mean or distribution of the outcome given treatment, mediator, confounders ( $f_{Y|M,T,X}$ ); the mediator given treatment and confounders ( $f_{M|T,X}$ ); and the treatment assignment mechanism ( $f_{T|X}$ ). The validity of these approaches depends on correct model specification across all components.

The methodology reformulates the mediation effect estimands as population moments, suitable for empirical calibration and weighting, freeing the approach from reliance on direct model specification.

2. Nonparametric Moment Balancing and Empirical Calibration

The chief methodological advance is the construction of nonparametric, observation-specific weights so that appropriate empirical moments are balanced between treatment and control groups, formalized as follows for functions $u(X)$ and $v(M,X)$ :

Population moment requirements include: $\E[u(X)] = \E\left[ T p_0(X) u(X) \right] = \E\left[ (1-T) q_0(X) u(X) \right]$ where

$p_0(X) = \frac{1}{N f_{T|X}(1|X)}, \quad q_0(X) = \frac{1}{N f_{T|X}(0|X)}$

and, for mediation functionals,

$\E[T r_0(M,X) v(M,X)] = \E[(1-T) q_0(X) v(M,X)]$

with

$r_0(M, X) = \frac{f_{M|T,X}(M|0, X)}{N f_{T|X}(1|X) f_{M|T,X}(M|1, X)}$

Rather than estimate the reference densities directly, one constructs weights $r_i$ by convex optimization, minimizing a divergence (e.g. Kullback-Leibler, quadratic) between weights and uniformity, subject to satisfying the empirical analogues of the balancing equations: $\min_{r_i} \sum_{i=1}^N T_i D(N r_i, 1)$ subject to

$\sum_{i=1}^N T_i r_i v_K(X_i, M_i) = \sum_{i=1}^N (1-T_i) \hat{q}_K(X_i) v_K(X_i, M_i)$

with $v_K(\cdot)$ a (possibly growing) basis of functions.

The solution exploits a dual formulation, yielding the optimal weights as: $\hat{r}_K(X_i, M_i) = \frac{1}{N} \rho'\left( \hat{\beta}_K^\top v_K(X_i, M_i) \right)$ where $\hat{\beta}_K$ maximizes a dual objective defined by the empirical likelihood.

3. Efficiency and Theoretical Guarantees

The estimator achieves global semiparametric efficiency for mediation functionals, meaning it attains the minimum possible asymptotic variance among all regular estimators in any data-generating process allowed under the identification assumptions, regardless of whether any working model is correct.

This contrasts with classical plug-in or doubly robust estimators, which only achieve the semiparametric efficiency bound if all or some of the parametric or semi-parametric models are correctly specified—a property called local efficiency.

The main result is that, under regularity,

$\hat\theta_{0K} = \sum_{i=1}^N T_i \hat{r}_K(X_i, M_i) Y_i \xrightarrow{p} \theta_0, \quad \sqrt{N} (\hat\theta_{0K} - \theta_0) \to N(0, V_{\theta_0}),$

where $V_{\theta_0}$ equals the lower information bound for the mediation effect.

4. Extension to Multiple Mediators

The empirical calibration approach extends naturally to cases involving multiple, possibly causally dependent mediators. For mediators $W$ and $M$ , the calibration/balancing step is performed using basis functions over the joint vector $(X, W, M)$ . Identification in these situations requires stronger versions of sequential ignorability, but the estimator is otherwise constructed identically: $\E[T r_0(X, W, M) v(X, W, M)] = \E[(1-T) q_0(X) v(X, W, M)]$ Indirect and path-specific mediation effects are handled with appropriately modified balancing functions and calibration weights.

5. Nonparametric Consistent Variance Estimation

A practical contribution is the direct, consistent estimation of the estimator's asymptotic variance without resorting to any additional nonparametric function estimation or resampling. The variance estimator relies on the fact that the estimator solves an empirical estimating equation,

$\frac{1}{N} \sum_{i=1}^N g_K(T_i, X_i, M_i, Y_i; \hat{\tau}_K) = 0,$

leading to a sandwich estimator: $\widehat{V}_K = \widehat{L}_K \widehat{P}_K \widehat{L}_K^\top$ where $\widehat{L}_K$ and $\widehat{P}_K$ are empirical derivatives and empirical variance of the stacked moment functions.

6. Estimation, Implementation, and Summary of Estimators

The methodology yields estimators for the relevant counterfactual means and causal effects via sample-weighted averages:

Average potential outcome for treatment $t$ :

$\hat{\delta}_{tK} = \sum_{i=1}^N \mathbb{I}\{T_i = t\} \hat{w}_K(X_i) Y_i$

where $\hat{w}_K(X_i)$ are calibration weights.

Direct effect estimator:

$\hat{\theta}_{0K} = \sum_{i=1}^N T_i \hat{r}_K(X_i, M_i) Y_i$

Indirect/direct effects:

$\widehat{\mathrm{NIE}} = \hat{\delta}_{1K} - \hat{\theta}_{0K}, \quad \widehat{\mathrm{NDE}} = \hat{\theta}_{0K} - \hat{\delta}_{0K}$

Weights are reliably estimated via convex maximization and the approach is computationally stable. The method accommodates multiple options for the divergence $D$ defining weight extremity, including exponential tilting, quadratic, and empirical likelihood.

7. Assumptions, Applicability, and Practical Considerations

Identification relies on consistency, (extended) sequential ignorability (as appropriate for number and structure of mediators), and positivity.
No parametric or semi-parametric modeling of conditional means or densities of outcome, mediator, or treatment is required.
Regularity conditions on the complexity and growth rate of the basis functions and on the behavior of calibration weights ensure consistency and asymptotic normality.
The estimator extends to arbitrary basis sets, and the structure permits easy use of flexible, high-dimensional feature sets.

8. Influence and Impact

This approach enables robust, easily interpretable causal mediation analysis in settings where model misspecification would otherwise hamper inferential validity. The estimator's construction via empirical calibration bridges advances from the causal inference, survey calibration, and semi-parametric efficiency literatures. Its global efficiency property guarantees minimal variance in large samples, and the variance estimation procedure ensures that practitioners can obtain credible uncertainty statements using only fitted weights and moment functions. The method’s extensibility to multiple and possibly dependent mediators makes it particularly appealing for applied research in complex biomedical and social science domains.

PDF Markdown Chat (Upgrade)

Follow-up Questions

We haven't generated follow-up questions for this topic yet.

Generate Now