Uncertainty Quantification & Confidence Intervals

Updated 27 November 2025

Uncertainty quantification is a framework for characterizing, quantifying, and propagating inherent uncertainties using both frequentist and Bayesian methods.
Confidence interval construction provides rigorous, probability-based bounds through test inversion, optimization, and resampling methodologies.
Recent advancements include robust and adaptive techniques that address model misspecification, high-dimensional constraints, and computational challenges.

Uncertainty quantification (UQ) is the mathematical and computational framework for characterizing, quantifying, and propagating the intrinsic uncertainties present in predictions, estimations, and inverse solutions across statistical, data-analytic, and applied science domains. Central to UQ is the construction of confidence intervals (CIs) and regions—procedures that supply rigorous, probability-based quantifications of uncertainty for functionals, estimands, or predictive quantities. These intervals control coverage rates with respect to frequentist or Bayesian long-run guarantees, under model- and/or data-generating assumptions, often in the presence of complex constraints and high-dimensionality. Recent research emphasizes robust, adaptive, and computationally efficient UQ methods capable of addressing model misspecification, input or procedural uncertainty, and application-specific side information.

1. Foundations: Coverage Guarantees and CI Construction

The construction of confidence intervals is founded on ensuring that, for any parameter or functional of interest $\theta$ (or $\mu = \varphi(x^*)$ for functional queries in inverse problems), a data-dependent interval $I_\alpha(y)$ satisfies

$\inf_{\text{all permissible } x} \Pr_{y\sim P_x}\big\{\varphi(x) \in I_\alpha(y)\big\} \ge 1-\alpha$

for a significance level $\alpha$ . This design-based frequentist criterion controls the type-I error rate in nonasymptotic regimes (Batlle et al., 2023).

Multiple approaches co-exist for achieving such coverage:

Simultaneous confidence sets, formed by constructing a multivariate region for the full parameter vector and projecting through the statistical functional of interest.
Likelihood-based intervals, including those formed by inverting likelihood-ratio tests. For a functional $\varphi(x)$ , inverted acceptance regions based on likelihood-ratio statistics yield intervals with explicit type-I error control (Batlle et al., 2023).
Optimization formulations, equivalently phrasing CI endpoints as solutions to constrained optimization problems integrating data-fit measures and parameter constraints.
Resampling and empirical likelihood methods, including bootstrap, jackknife, and empirical likelihood intervals, often harnessed for complex simulators, black-box models, or nonparametric estimands (Lam et al., 2017, He et al., 12 Aug 2024).

A general structure emerges: for each candidate value of the estimand, build a test, calibrate the critical value to desired error, and invert the acceptance region to yield a confidence set. Coverage is then guaranteed by construction via duality arguments.

2. Optimization-Based and Strict-Bounds Approaches

Optimization-based confidence intervals operate by posing the CI endpoints as solutions to optimization problems constrained by data-consistency and model-side information. The framework (commonly referred to as "strict bounds") is particularly compelling for inverse and ill-posed problems with nonsmooth parameter constraints (e.g., nonnegativity, monotonicity, convexity) (Batlle et al., 2023, Kuusela et al., 2015). For linear inverse models,

$I^-(y) = \min_{x \in \mathcal{X}} \varphi(x)\quad \text{s.t.}\; \ell(x; y) \leq \tau$

where $\mathcal{X}$ encodes constraints and $\tau$ is chosen so that the constraint set contains the true parameter with the required coverage.

Two key flavors exist:

Simultaneous strict bounds: Build a $1-\alpha$ confidence set for $x^*$ , intersect with constraints, project through $\varphi$ .
Test-inversion/likelihood-ratio approach: For a fixed functional value $\mu$ , invert the likelihood-ratio test $H_0: \varphi(x^*)=\mu$ versus $x^* \in \mathcal{X}$ , forming intervals by solving scalar optimizations over $\mu$ .

These methods can be implemented in both the primal and dual: the dual formulations are especially useful for high-dimensional or nonconvex constraint sets. Under convexity and constraint qualification, the derived intervals are nearly minimal while retaining finite-sample frequentist validity.

A salient result in this class is the formal refutation of the "Burrus conjecture" for linear Gaussian models with positivity constraints: applying a fixed $\chi^2_1$ threshold to the likelihood-ratio statistic can result in undercoverage, especially in higher dimensions. Instead, coverage necessitates functional and data-calibrated critical values (Batlle et al., 2023).

3. Uncertainty Quantification Under Input and Model Uncertainty

UQ under input or model uncertainty—both parametric and nonparametric—arises in stochastic simulation, machine learning, and inverse problems. Methods must account for not only output variability, but also the sampling error in estimated models or distributions:

Parametric bootstrap and advanced estimators: In simulation, percentile bootstrap intervals can be improved via $k$ -nearest-neighbor (kNN) or likelihood-ratio (kLR) pooling, which reduce bias, control variance, and improve coverage for high-dimensional ratio estimands (He et al., 7 Oct 2024). Bootstrap intervals using standard estimators can overcover, due to finite-r bias, while kLR intervals achieve near-nominal coverage at substantially reduced cost.
Empirical likelihood and optimization: Construct CIs for simulation functionals by solving weight-optimization problems under empirical divergence constraints, improving finite-sample performance and computational efficiency compared to naive bootstrap (Lam et al., 2017). The worst-case within an empirical likelihood ball is solved by convex programming, yielding asymptotically valid bounds.
Nested Monte Carlo: For risk measures such as Value-at-Risk (VaR) and Conditional Value-at-Risk (CVaR) under input uncertainty, nested simulation with budget-aware allocation minimizes bias and ensures asymptotically proper coverage (Zhu et al., 2015). The bias-corrected CI widths reflect the balance of inner and outer simulation, with analytical formulas for variance and higher-order corrections.

4. Methods in High-Dimensional and Nonparametric Learning

Uncertainty quantification in modern machine learning and high-dimensional regimes requires careful treatment of bias, complex noise, and dependence structures:

Debiased estimators and nonasymptotic intervals: For predictors such as LASSO or neural networks, finite-sample CIs require decomposing error into Gaussian and non-Gaussian bias terms. Data-driven correction of the non-vanishing bias via empirical mean and variance estimates yields nonasymptotic, per-coordinate CIs with controlled coverage (Hoppe et al., 18 Jul 2024). This approach is shown to generalize to modern deep learning architectures including U-Net, It-Net, and Vision Transformers.
Interval neural networks (INNs): A fundamentally non-probabilistic approach, INNs propagate interval-valued parameters via neural network layers, yielding prediction intervals guaranteed to contain all pointwise realizations for parameters within learned ranges. These methods require no distributional assumption, and coverage is enforced empirically through a penalized loss (Ferah et al., 26 Apr 2025).
Bootstrap for heteroscedastic deep regression: Procedures leveraging ReLU networks, residual-based variance estimation, and a robust residual bootstrap produce honest, nonasymptotic coverage for conditional mean estimation under heavy-tailed or heteroscedastic noise (Padilla et al., 29 Dec 2024). The workflow combines data splitting, bootstrap resampling, and function-class complexity control to deliver CIs adaptive to both network bias and variance.

5. Robust and Adaptive Confidence Intervals

Constructing CIs that remain valid under adversarial model contamination or misspecification is a central topic in robust statistics:

Huber contamination: When the proportion of contamination is unknown, the minimax length of adaptive CIs for location estimation can only decay at the slow rate $1/\sqrt{\log n}$ , even for Gaussian location models. The optimal procedure entails simultaneous uncertainty quantification of empirical quantiles across levels, with CI endpoints given by intersecting data-driven one-sided quantile bounds (Luo et al., 30 Oct 2024).
Generalization to other distributions: For symmetric, unimodal location families, CI length depends on the shape/curvature of the density; heavy tails prevent shrinkage. Construction proceeds by inverting robust tests, and achieving equality between lower and upper separation rates under monotone likelihood-ratio conditions.

6. Confidence Regions, Sets, and Feature-wise UQ

Extension from univariate CIs to regions and sets is crucial in high-dimensional, decision-critical, and selective inference tasks:

Excursion set confidence bands: Rather than pointwise intervals, one may be interested in identifying with high confidence the subset of feature space where an expected outcome (e.g., $f(x) \geq \tau$ ) exceeds a critical threshold. Model-agnostic procedures construct inner and outer random sets ( $\widehat S_\text{in}, \widehat S_\text{out}$ ) via empirical error estimation, tuning parameters to tightly sandwich the excursion set with family-wise error control (Ren et al., 28 Apr 2025).
Simultaneous and shape-constrained intervals in inverse problems: For ill-posed problems, such as spectral unfolding in high-energy physics, nonparametric CIs for all linear functionals can be constructed by projecting a strict bounds set defined by physical shape constraints (e.g., monotonicity, convexity) and data-consistency, yielding conservative but sharp simultaneous coverage (Kuusela et al., 2015).

7. Computation and Statistical Optimality in Black-Box and Simulation Models

When inference relies on expensive simulators, the cost of replicates fundamentally limits achievable CI width, prompting a paper of statistical optimality given computational constraints:

Batching, jackknife, and bootstrap equivalence: Statistically optimal CIs for a $K$ -run budget can be constructed using standard non-overlapping batching, as well as overlapping batches, jackknife, and weighted bootstrap. Asymptotic uniform most accurate unbiasedness (UMAU) leads to a unified $t$ -pivot formula parameterized by the covariance structure of run means, $\Sigma$ (He et al., 12 Aug 2024). All such methods attain asymptotically minimal expected length among homogeneous intervals; standard batching is globally optimal. Explicit formulas enable plug-in CIs that flexibly adapt to batching, overlaps, or resampling structures.
Practical guidelines: In most settings, equal-size nonoverlapping batching is preferred. Overlapping or bootstrapped CIs may marginally improve interval length at the price of more complex covariance estimation, but all methods offer comparable coverage in large-sample regimes.

Uncertainty quantification and confidence interval construction thus operate across a diverse methodological spectrum—from constrained optimization, test inversion, and advanced resampling to robust and nonasymptotic techniques—all united by precise statistical coverage criteria, algorithmic tractability, and adaptability to modern, high-dimensional science and engineering challenges (Batlle et al., 2023, He et al., 7 Oct 2024, Hoppe et al., 18 Jul 2024, Padilla et al., 29 Dec 2024, Lam et al., 2017, He et al., 12 Aug 2024, Ren et al., 28 Apr 2025, Ferah et al., 26 Apr 2025, Luo et al., 30 Oct 2024, Kuusela et al., 2015).