Potential Outcomes Framework
- Potential Outcomes Framework is a rigorous approach to causal inference that defines causal effects as contrasts between counterfactual outcomes under different treatment conditions.
- It formalizes effect estimation through explicit modeling of the assignment mechanism and employs methods ranging from Neymanian, Fisherian, to Bayesian inference for robust analysis.
- The framework underpins applications in diverse fields such as statistics, economics, and biomedical research, providing transparent assumptions for both randomized and observational studies.
The potential outcomes framework, often attributed to Neyman and Rubin, provides a rigorous foundation for formulating, interpreting, and analyzing causal effects in experimental and observational studies. The central construct is the set of potential outcomes—one for each treatment condition—assigned to each experimental unit. Causal effects are defined as contrasts between these potential outcomes, with inference relying on the combination of the assignment mechanism and structural assumptions about units, treatments, and the data-generating process. The framework is central to modern causal inference and underpins methodologies in fields ranging from statistics and economics to computer science and biomedical research.
1. Formal Foundation and Notation
The potential outcomes framework formalizes causality by associating each unit with a vector of outcomes, each corresponding to a possible treatment:
- For binary treatment: and , denoting the outcomes under treatment () and control (), respectively.
- For multilevel or factorial treatments: for every possible treatment combination .
- For continuous treatments (dose-response): .
Only one of these potential outcomes can be observed for each unit, introducing a fundamental missing data problem. The observed outcome is typically written as:
Causal estimands are defined by contrasts of potential outcomes; for example, the average treatment effect (ATE) in the binary case:
In multifactorial designs—such as factorial experiments—unit-level factorial effects are contrasts of the potential outcomes determined by pre-defined contrast vectors :
and averaged over the population:
For ordinal, count, or continuous outcomes, the framework formulates appropriate estimands (probabilities of benefit, average dose-response, etc.), often emphasizing finite-population, superpopulation, or principal stratification perspectives (Lu et al., 2015, Bodik, 16 Mar 2024).
2. Assignment Mechanisms and Identification
Causal inference is enabled by explicit modeling of the assignment mechanism , indicating which treatment is assigned to each unit:
In randomized experiments, is determined by the experimental design, and expectations over treatment assignments facilitate unbiased estimation of population parameters. For example, the assignment vector ensures , and observed sample means
provide unbiased estimators for the average potential outcome under (Dasgupta et al., 2012).
In observational studies, the assignment mechanism must be assumed to satisfy ignorability or unconfoundedness, often conditional on observed covariates (e.g., ), to identify potential outcome contrasts from observed data.
Positivity, or the condition that all treatment combinations occur with nonzero probability, is necessary for identification in both randomized and observational settings (Lara, 2023).
For complex assignments (matched sampling, stratified randomization, longitudinal allocation), the assignment mechanism is an explicit probabilistic model used to derive unbiased or consistent estimators (Branson et al., 2018, Hess et al., 4 Oct 2024).
3. Inference Methods and Randomization Theory
The potential outcomes framework is tightly integrated with the randomization-based (Neymanian) and randomization test (Fisherian) inference traditions:
- Neymanian inference: Relies on calculating sampling distributions of estimators under the randomization distribution. For example, in factorial experiments, the variance of estimated factorial effects is derived as
with simplification under strict additivity (Dasgupta et al., 2012).
- Fisherian inference: Employs sharp null hypotheses (all unit-level effects fixed to ) so that missing potential outcomes are imputed, and the full randomization distribution of test statistics can be obtained for p-value calculation and confidence interval construction.
- Design-based inference: Modern developments allow potential outcomes to be random (not fixed), modeling each as where indexes latent randomness. Treatment effects are defined as linear functionals in a Hilbert space, with estimators based on their Riesz representers. This extension accommodates randomness in outcomes from measurement noise or interference and justifies inference under local (intra-block or network) dependence (Yang, 2 May 2025).
- Bayesian causal inference: Missing potential outcomes are treated as unknowns to be imputed using posterior predictive distributions. For count data, outcomes are modeled using Poisson or overdispersed models (e.g., lognormal-Poisson), with special approximations for imputation (e.g., approximating log-Gamma by normal in large-sample limits) (Lee et al., 2020).
Sampling-based randomization stages (randomize-first–sample-second vs. sample-first–randomize-second) yield identical inferential properties for classic estimators under the fixed potential outcomes model (Branson et al., 2018).
4. Extensions to Complex Study Designs
The potential outcomes framework generalizes naturally to multifactorial, strip-plot, and longitudinal settings:
- Factorial and strip-plot designs: Each unit is assigned a vector of potential outcomes for all treatment combinations. Effects are defined as linear contrasts (weighted sums), and randomization forms the basis for unbiased estimation and variance derivation. Conservative variance estimators may be required due to the impossibility of unbiased variance estimation, and minimaxity properties can be formally established (Alquallaf et al., 2018, Dasgupta et al., 2012).
- Ordinal outcomes: Classic estimands such as the average treatment effect are difficult to interpret for ordinal data. When only marginal distributions are identified, sharp bounds for causal parameters are derived (e.g., probabilities of treatment being beneficial), using only the observable margins and standard LP arguments (Lu et al., 2015).
- Principal stratification: Joint potential outcomes for post-treatment variables (e.g., noncompliance) can be formulated to define principal causal effects, with identification and estimation built on extensions of the framework (e.g., using joint distributions and invariance arguments) (Wu et al., 29 Apr 2025).
- High-dimensional/flexible outcome spaces: Flow-based generative models (such as continuous normalizing flows) have been developed to nonparametrically model the joint density of potential outcomes and enable individual-level or counterfactual inference without restrictive distributional assumptions (Wu et al., 21 May 2025).
- Continuous time: Recent neural approaches have advanced the estimation of conditional average potential outcomes in continuous time, handling irregular measurement and treatment schedules and adjusting for dynamic confounding using stabilized continuous-time inverse propensity weighting (e.g., SCIP-Net) (Hess et al., 4 Oct 2024).
5. Connections to Structural and Graphical Causal Models
The potential outcomes framework is frequently compared to, and reconciled with, Pearl’s structural causal models (SCMs) and directed acyclic graphs (DAGs):
- The PO framework defines effects through explicit counterfactuals (potential outcomes for each treatment regime), with identification rooted in experimental manipulation or ignorability/invariance assumptions.
- SCMs use “do-operators” and model the data-generating process through structural equations and graphs, encoding conditional independence, exclusion restrictions, and modularity.
- The two frameworks coincide in their single-outcome implications (i.e., marginal distributions of and ) only if covariates used for adjustment are not themselves affected by treatment (“no posttreatment covariates”). If this is violated, the estimated direct effect (via PO, holding covariates fixed) and the total effect (via SCM, propagating downstream) may differ, emphasizing the importance of model specification for correct interpretation (Lara, 2023).
- Recent developments formalize the algebraic and logical equivalence of identification results (e.g., LATE via IV) under both frameworks and characterize the axiomatic requirements for representability and completeness (Ibeling et al., 2023).
A summary comparison is provided below:
Feature | Potential Outcomes (PO) | Structural Causal Model (SCM) |
---|---|---|
Definition of Causal Effect | Contrast of potential outcomes | Output of do-interventions |
Counterfactuals | Explicit, unit-level | Generated by modifying SC equations |
Covariates Affecting Treatment | Excluded from adjustment sets | Modeled in the graph (path/edges) |
Formal Equivalence | Holds under no posttreatment covariates | May differ when T affects X or Y |
Expressiveness | Flexible with shape restrictions, stratification | Well suited for visualizing independencies, exogeneity |
6. Strengths, Limitations, and Practical Considerations
Strengths:
- Direct and precise definition of causal estimands—average, quantile, principal effects, or other summaries—tied to scientific questions (Dasgupta et al., 2012, Lu et al., 2015).
- Explicit modeling of the assignment mechanism enables unbiased estimation, transparency of assumptions, and robustness to non-additivity or outcome heterogeneity.
- Flexibility with respect to alternatives to OLS, finite- vs. superpopulation estimands, development of robust and conservative inference procedures.
- Immediate extension to complex designs (multifactorial, longitudinal, high-dimensional) via contrast or generative modeling approaches.
Limitations and Challenges:
- Unverifiability of unit-level potential outcomes; only one is observed per unit, making individual-level counterfactual claims dependent on modeling or imputation assumptions (Höltgen et al., 24 Jul 2024).
- Sensitivity of causal effect estimation to covariate selection; inclusion of non-confounders (instruments, mediators, colliders) can compromise bias adjustment, requiring careful causal structure consideration (Zhao et al., 2023).
- Alignment with SCMs and DAGs demands explicit clarification of which “effect” is targeted—direct, indirect, or total—and the status (pre- or posttreatment) of covariates (Lara, 2023).
- The traditional framework assumes potential outcomes are well-defined and not affected by interference or complex network dependencies; extensions are active research areas incorporating random potential outcomes, local dependencies, and interference (Yang, 2 May 2025).
7. Contemporary Developments and Future Directions
- Integration with Machine Learning: Flow-based generative approaches, neural ODEs, and neural controlled differential equations are increasingly used to model flexible, high-dimensional outcome spaces and predict counterfactuals over time, often with improved uncertainty quantification and predictive accuracy (Wu et al., 21 May 2025, Hess et al., 4 Oct 2024).
- Personalized and dynamic causal inference: Frameworks leveraging wearable sensor data and individual augmentation are being developed to generate personalized “what-if” trajectories and infer individualized response variability (Subramanian et al., 20 Aug 2025).
- Extrapolation to extreme/interpolated treatments: Potential outcomes models now incorporate tail modeling via extreme value theory to enable causal analysis for treatment levels beyond observed data (Bodik, 16 Mar 2024).
- Multi-experiment identification: Advances allow estimation of causal estimands that rely on the joint distribution of potential outcomes by exploiting multiple randomized experiments and transportability assumptions, enabling finer-grained treatment evaluation and principal stratification (Wu et al., 29 Apr 2025).
- Finite-population and non-counterfactual frameworks: Critiques of the metaphysical aspects of counterfactuals have led to alternative finite-population, prediction-based models where all assumptions are empirically testable and entirely grounded in finite-sample inference, reinforcing the distinction between statistical and scientific inference and highlighting model dependence of causal claims (Höltgen et al., 24 Jul 2024).
The potential outcomes framework thus constitutes both the unifying logic and the technical apparatus for causal analysis across experimental designs, data types, and scientific disciplines. Its continued development will likely involve increased integration with computational methods, more explicit handling of complex structural features (such as heterogeneous assignment mechanisms, network interference, and time-varying dynamics), and a renewed emphasis on transparent, empirically testable assumptions.