Model Error in Counterfactual Worlds

Updated 7 December 2025

The paper details a decomposition of model error into miscalibration and scenario deviation, providing a clear mathematical framework.
It presents regression and surrogate modeling approaches to estimate counterfactual errors under varying data regimes and assumptions.
The work discusses robust counterfactual validation, measurement error impacts, and practical guidelines for scenario design in policy evaluation.

Model error in counterfactual worlds refers to the challenge of quantifying, attributing, and mitigating errors that arise when projecting the consequences of hypothetical scenarios using statistical, machine learning, or mechanistic models. Unlike observed forecasting, counterfactual evaluation interrogates a model’s calibration on outcomes that did not—and often cannot—occur, making direct empirical validation impossible. Errors in these settings stem from both model miscalibration (the intrinsic discrepancy between model and truth under the specified scenario) and scenario deviation (the difference between the realized environment and the hypothesized counterfactual). This topic is central in decision support, policy evaluation, algorithmic fairness, and causal inference, and is the focus of a growing literature on principled estimation, identification, and robustness of models used to answer "what if?" questions in science and policy.

1. Formal Decomposition of Model Error in Counterfactual Worlds

Let $x$ denote a scenario axis (e.g., vaccination coverage), with counterfactual values $x_1, ..., x_K$ . For a model $m$ , $P^m(y|x_i)$ denotes the model’s point projection at scenario $x_i$ ; the true (possibly unknown) mapping is $P^*(y|x)$ .

Model Miscalibration: $e^m(x_i) \equiv P^m(y|x_i) - P^*(y|x_i)$ , the error between model and the unobservable truth under scenario $x_i$ .
Scenario Deviation: $\delta_{scen}(x_i, x^*) \equiv P^*(y|x_i) - P^*(y|x^*)$ , where $x^*$ is the realized scenario.
Observed Deviation:

$P^m(y|x_i) - P^*(y|x^*) = e^m(x_i) + \delta_{scen}(x_i, x^*)$

This decomposition emphasizes that even when the realized world $x^*$ is close to $x_i$ , observed deviation conflates model calibration error and scenario deviation, complicating attribution in empirical validation (Howerton et al., 30 Nov 2025).

2. Methodological Strategies for Estimating Counterfactual Model Error

Approach 1: Evaluation on "Plausible" Scenarios

Estimate $e^m(x_i)$ for $x_i$ close to $x^*$ by assuming negligible scenario deviation $|\delta_{scen}(x_i, x^*)| \ll 1$ : $\hat{e}^m(x_i) = P^m(y|x_i) - y^* \approx e^m(x_i)$ This approach strictly applies when the counterfactual scenario is nearly realized, otherwise it introduces bias due to unaccounted scenario deviations.

Approach 2: Error Distribution Regression Across Units

With multiple units $\ell=1...L$ experiencing various $x^*_\ell$ and reprojected predictions $P^m(y|x^*_\ell)$ , fit a regression: $e^m_\ell(x^*_\ell) = g(x^*_\ell;\theta) + \epsilon_\ell$ Predict $e^m(x_i)$ by evaluating $g(x_i)$ , yielding scalable estimates under the assumption that model error structure generalizes from observed to counterfactual regimes (Howerton et al., 30 Nov 2025).

Approach 3: Surrogate Modeling of the Data-Generating Process

Construct a statistical or semi-parametric model $f(x;\phi)$ fit to observed $(x^*_\ell, y_\ell)$ pairs and use it as a stand-in for $P^*(y|x)$ . Model error at scenario $x_i$ is then

$\hat{e}^{m}_\ell(x_i) = P^m_\ell(y|x_i) - f(x_i;\phi)$

This approach requires strong surrogacy and no omitted confounding, but leverages well-established regression/causal inference machinery (Howerton et al., 30 Nov 2025).

Summary Table

Approach	Key Assumption	Primary Benefit / Limitation
Plausible scenarios (1)	$x_i$ close to $x^*$ ( $\delta_{scen} \approx 0$ )	No extrapolation, but bias if $x_i \not\approx x^*$
Error regression (2)	Error structure generalizes across $x$	Captures non-linearities, requires reprojection
Surrogate model (3)	Surrogate fit accurate, causal no-omission	Unified modeling, sensitive to misspecification

Approaches 2 and 3 empirically yield accurate population-level error recovery when unit-level covariates are used, whereas Approach 1 is generally biased except in trivial cases (Howerton et al., 30 Nov 2025).

3. Model Uncertainty and Distributional Ambiguity in Counterfactual Evaluation

Counterfactual analysis in the presence of model parameter uncertainty motivates the use of distributional ambiguity sets. In the distributionally robust paradigm, given only moments $\mu$ , $\Sigma$ of model parameters $\theta$ , one computes bounds on counterfactual validity for a plan $x$ : $p_{min}(x) = \begin{cases} 0, & m \le 0 \ \frac{m^2}{m^2 + v}, & m > 0 \end{cases}$ with $m = a^\top \mu + b$ , $v = a^\top \Sigma a$ , and $a = \nabla_\theta f_\theta(x)|_{\theta=\mu}$ (Bui et al., 2022). Robustification is achieved by maximizing $p_{min}(x)$ subject to feasibility constraints, ensuring counterfactual recommendations retain validity under plausible model parameter shifts. This approach provides tractable and interpretable worst-case performance certificates.

4. Identifiability, Nonidentifiability, and Worst-Case Error Bounds

The identifiability of model error in counterfactuals depends on both structural assumptions and observability:

Monotonic, 1-D Exogenous SCMs: Under strictly monotonic structural equations and univariate exogenous noise, learned models fitting the observed conditionals must agree on all counterfactuals (Nasr-Esfahany et al., 2023).
Multi-Dimensional Exogenous Noise: Non-identifiability becomes generic; there exist observationally indistinguishable models with divergent counterfactual predictions.
Worst-case error estimation proceeds by training a second model that matches observational fit but maximally disagrees on counterfactual queries, yielding a rigorous upper bound on counterfactual error for a given learned model (Nasr-Esfahany et al., 2023).

This nonidentifiability directly impacts the reliability of applications such as counterfactual fairness and user-facing explanations.

5. Measurement Error, Scenario Misspecification, and Error Propagation

Observed data often differs from the true underlying scenario due to measurement error. In trade and spatial models, measurement error in the baseline propagates into substantial counterfactual uncertainty. Sanders (Sanders, 2023) demonstrates, using empirical Bayes deconvolution, that measurement error can dominate parametric uncertainty, and recommends sampling from joint posteriors over baseline data and parameters to accurately quantify uncertainty in counterfactual outcomes.

Moreover, in dynamic SCMs with chaotic or near-chaotic dynamics, small model or parameter errors can be exponentially amplified in counterfactual sequence prediction, rendering long-horizon counterfactual analysis unreliable by principle. The Lyapunov exponents and the Jacobian spectrum quantify this horizon of predictability; practical counterfactual reasoning must respect these dynamical limits (Aalaila et al., 31 Mar 2025).

6. Scenario Design Principles and Practical Guidelines

Robust estimation of model error in counterfactual worlds necessitates scenario designs that enable error decomposition:

Scenario axis specification: Use continuous or finely ordered scenario axes.
Variation across units: Ensure heterogeneity in realized $x^*$ to enable regression-based estimation.
Predefined evaluation protocols: Set data collection and fitting protocols in advance.
Explicit documentation: Make model fixed-assumption axes explicit to distinguish projection from realization (Howerton et al., 30 Nov 2025).

Guidelines for practitioners:

Record, for each projection, $(m, x_i, y^m_i)$ and, after realization, $(\ell, x^*_\ell, y^*_\ell)$ .
Choose the estimation approach best suited to data structure and modeling context (prefer Approach 3 when a good surrogate is available; use Approach 2 when mechanistic models outperform surrogates).
Decompose and report observed deviation into $e^m(x_i)$ and $\delta_{scen}(x_i, x^*)$ , and provide total error and uncertainties.

Regularizing toward minimal intervention sparsity, validating surrogates against out-of-sample data, and performing adversarial evaluation of learned models ensure error control and calibrate confidence in counterfactual projections (Howerton et al., 30 Nov 2025, Zhou et al., 2023, Duong et al., 2023).

References

"Assessing model error in counterfactual worlds" (Howerton et al., 30 Nov 2025)
"Counterfactual Plans under Distributional Ambiguity" (Bui et al., 2022)
"Counterfactual (Non-)identifiability of Learned Structural Causal Models" (Nasr-Esfahany et al., 2023)
"Measurement Error and Counterfactuals in Quantitative Trade and Spatial Models" (Sanders, 2023)
"When Counterfactual Reasoning Fails: Chaos and Real-World Complexity" (Aalaila et al., 31 Mar 2025)
"Towards Characterizing Domain Counterfactuals For Invertible Latent Causal Models" (Zhou et al., 2023)
"Achieving Counterfactual Fairness with Imperfect Structural Causal Model" (Duong et al., 2023)
"Imputation of Counterfactual Outcomes when the Errors are Predictable" (Goncalves et al., 12 Mar 2024)