Uncertainty Quantification (UQ) Methods
- Uncertainty Quantification (UQ) is a rigorous framework for identifying, characterizing, and propagating both epistemic and aleatoric uncertainties in models.
- UQ methods encompass Monte Carlo simulations, surrogate models like polynomial chaos and Gaussian processes, and advanced deep learning approaches.
- Best practices in UQ stress calibration, sensitivity analysis, and variance-cost tradeoffs to enhance model reliability and risk-aware decision making.
Uncertainty quantification (UQ) encompasses the rigorous identification, characterization, and propagation of uncertainties in mathematical models, computational simulations, and machine learning predictions. UQ enables the assessment of prediction reliability by dissecting sources of error, quantifying epistemic (model) versus aleatoric (data) uncertainty, and providing practical metrics or intervals for risk-aware decision making. UQ methods are integral across engineering, computational science, and modern data-driven fields, where they support robust design, statistical inference, and trustworthy deployment of high-dimensional or physics-based models.
1. Fundamental Concepts and Mathematical Frameworks
The objective of UQ is to represent the uncertainty in model outputs as induced by uncertainty in inputs, parameter choices, observations, or structural model inadequacy. The two canonical sources of predictive uncertainty are:
- Aleatoric uncertainty: Irreducible randomness due to inherent data noise or randomness in measurements.
- Epistemic uncertainty: Reducible uncertainty arising from incomplete information about the model, such as finite sampling or unknown parameters (Ajirak et al., 7 Sep 2025, He et al., 2023).
In probabilistic models, the predictive distribution for a new output given input and data is
where encodes model epistemic uncertainty, and captures observation noise (aleatoric uncertainty).
The law of total covariance quantifies these decompositions:
2. Core Methodologies: Monte Carlo, Surrogate, and Polynomial Approaches
Monte Carlo and Related Methods
The standard Monte Carlo (MC) estimator provides a baseline for UQ by averaging repeated samples of the output for independent draws of random inputs. Its variance decays as but incurs high computational cost for expensive models. Modern MC variants include:
- Multilevel Monte Carlo (MLMC): Leverages a hierarchy of low- and high-fidelity models, employing a telescoping sum of finer and coarser simulations to achieve variance reduction at lower cost (Zhang, 2020).
- Multifidelity MC (MFMC): Utilizes statistical correlation between multiple surrogate and full-fidelity models to reduce variance via control variates (Zhang, 2020).
- Lasso Monte Carlo (LMC): Constructs a sparse linear surrogate (lasso regression) and employs a fold-averaged, bias-corrected two-level estimator, maintaining unbiasedness and improving accuracy in high dimensions (Albà et al., 2022).
These methods exploit variance–cost tradeoffs and have established error analyses and implementation protocols for high-dimensional or computationally expensive UQ tasks.
Surrogate Modeling: Polynomial Chaos and Gaussian Processes
Polynomial Chaos Expansions (PCE)
PCE methods express the model output as an expansion in orthogonal polynomials of the random inputs:
where are multivariate polynomials (Legendre for uniform, Hermite for normal inputs) (Kumar et al., 2022, Wang et al., 7 Jan 2025). Coefficients can be determined by regression (non-intrusive) or projection (intrusive). PCE affords closed-form moments and sensitivity indices but suffers from combinatorial growth in basis size with problem dimension and polynomial order.
Gaussian Process (GP) Surrogates
GP regression models the output as a stochastic process 0. The predictive mean and variance are obtained analytically, providing uncertainty estimates at each test point responsive to both noise (aleatoric) and model uncertainty (epistemic via limited data support) (Kumar et al., 2022, Wang et al., 7 Jan 2025). GP surrogates handle non-smooth or non-parametric responses, but scalability is an issue for large training samples due to cubic cost in 1.
Stochastic Collocation and Generalized PC
Non-intrusive stochastic collocation leverages quadrature or sparse grid ensemble runs at collocation points and projects model outputs onto polynomial bases (gPC), achieving spectral convergence for smooth quantities and integrating efficiently with high-performance solvers (Zhong et al., 19 Aug 2025).
Dimension Reduction and Gradient-Enhanced Methods
For moderate 2, methods such as univariate dimension reduction (UDR) and gradient-enhanced UDR (GUDR) construct surrogate integrals using low-dimensional quadrature projections and include gradient information where available to improve rare-event estimation with minimal runs (Wang et al., 7 Jan 2025).
3. Uncertainty Quantification in Machine Learning and Deep Learning
Probabilistic Machine Learning Models
In Bayesian or kernel-based machine learning, UQ is embedded in the predictive posterior. Efficient MC sampling and scalable random feature approximations (e.g., RFF-GPs) enable UQ in high-dimensional latent variable models, with rigorous decompositions of predictive variance into epistemic and aleatoric components (Ajirak et al., 7 Sep 2025).
Deep Neural Networks: Taxonomy and State-of-the-Art
Deep learning UQ methods are systematically organized by uncertainty source (He et al., 2023):
- Epistemic-focused methods:
- Bayesian neural networks via variational inference (mean-field VI, Laplace), MC-dropout, ensembles.
- Sample-density aware methods (deep Gaussian processes, kernel-based or spectral-normalized schemes).
- Aleatoric-focused methods:
- Heteroscedastic networks predicting per-sample noise parameters.
- Mixture density networks, quantile or conformalized quantile regression, deep generative models (cVAE, cGAN).
- Joint epistemic–aleatoric approaches:
- Combined MC-dropout/ensemble with heteroscedastic outputs.
- Evidential deep learning using higher-order posterior distributions (Dirichlet or Normal-Inverse-Gamma heads).
Lightning UQ Box provides a standardized framework for implementing and benchmarking these methods, covering contemporary approaches (deep ensembles, MC-dropout, SWAG, Bayesian single-layer, quantile-based, evidential methods, GP-head hybrids) and cross-validating their calibration, accuracy, and interval coverage in vision and time series regression tasks (Lehmann et al., 2024).
Algorithms for Uncertainty in Neural Science/Engineering
Recent methods exploit information bottleneck regularization to encourage latent representations to be confident inside the training distribution and abstain or inflate uncertainty for out-of-distribution samples. These train an encoder with a “confidence mask” and a Gaussian decoder, sometimes employing widened distributions via normalizing flows for OOD detection. Compared to MCMC Bayesian neural nets, these methods can achieve orders-of-magnitude speedups and more reliable OOD calibration (Guo et al., 2023).
Sample-based approaches such as DISCO Nets and extensions minimize the energy score to ensure predictive samples cover the true data distribution, supporting efficient probabilistic regression with SHAP-based local interpretability (Kanazawa et al., 2022).
4. Domain-Specific UQ: Engineering, Fluid Dynamics, and Physical Sciences
In engineering, UQ techniques account for uncertainties in both simulation parameters and experimental measurement for robust design:
- Non-intrusive PCE and GP meta-models are used extensively, often in combination (PC-Kriging) to leverage global polynomial trends and local non-parametric corrections (Kumar et al., 2022, Wang et al., 7 Jan 2025).
- Collocation methods integrate tightly with HPC fluid simulators, demonstrating spectral convergence for smooth flow quantities and scalable parallel implementations (Zhong et al., 19 Aug 2025).
- Inverse UQ methodologies are advanced for inferring input parameter statistics consistent with observed experimental data, using frequentist (maximum likelihood), Bayesian (full/posterior with model discrepancy), and empirical coverage-based strategies, with extensive applications in reactor physics and thermal-hydraulics. Method selection is determined by rigor, ease-of-use, requirement for code intrusiveness, and bias-handling capability (Wu et al., 2021).
For hybrid dynamical systems (e.g., systems with mode switches), specialized wavelet-based polynomial chaos (Haar expansions), boundary-layer regularization for reset conditions, and transport PDE frameworks are devised to preserve spectral convergence and capture state discontinuities, bypassing the smoothness requirements of classical PCE (Sahai et al., 2011).
Ensemble-based Bayesian UQ, notably ensemble Kalman filter variants (EnKF, EnRML, EnKF-MDA), are established as efficient surrogates to full MCMC in computationally intensive fluid simulations, matching posterior mean and maintaining credible intervals even with limited ensemble sizes (Zhang et al., 2020).
5. Validation, Calibration, and Practical Assessment Tools
UQ reliability depends critically on calibration—that predicted uncertainties match observed errors:
- Consistency: calibration with respect to the predicted uncertainty scale (identifiable in reliability diagrams plotting observed RMSE against predicted RMV) (Pernot, 2023).
- Adaptivity: calibration with respect to input features or regions, ensuring model uncertainties adapt to local data density, OOD inputs, or rare events.
- Interval-based metrics (coverage, local coverage, interval scores) and variance-based metrics (local z-variance, calibration error) provide comprehensive diagnostic tools.
- Confidence curves and probabilistic reference simulations benchmark selective error reduction as high-uncertainty samples are excluded.
These methods guide benchmarking of UQ methods in both synthetic and application-specific settings, revealing strengths and coverage gaps.
6. Guidelines for Method Selection and Best Practices
Methodological selection in UQ must be tailored to problem characteristics:
- For low/moderate dimensions and smooth QoIs, PCE or stochastic collocation is appropriate for rapid convergence (Kumar et al., 2022, Wang et al., 7 Jan 2025, Zhong et al., 19 Aug 2025).
- For high-dimensional, sparse or mildly nonlinear functions, Lasso Monte Carlo, MFMC, or MLMC are preferred due to insensitivity to dimension and improved efficiency (Albà et al., 2022, Zhang, 2020).
- For highly nonlinear or data-driven tasks, GP or kriging is advantageous for nonparametric function approximation and robust uncertainty intervals, at the cost of 3 scaling (Wang et al., 7 Jan 2025).
- In deep learning and scientific ML, ensemble, Bayesian variational, evidential, and bottleneck-based methods are selected balancing accuracy, epistemic/aleatoric separation, computational cost, and calibration (Lehmann et al., 2024, Guo et al., 2023, Ajirak et al., 7 Sep 2025).
Empirical surrogate-free approaches (MC, DOE/UQ) may be required where black-box solvers are in use, while modular hybrid architectures and OOD-capable models are gaining traction for complex multi-physics and scientific ML applications (Mittal et al., 2014, Guo et al., 2023).
7. Current Trends, Challenges, and Research Directions
Recent developments focus on:
- Scalable, high-dimensional UQ (sparse surrogates, bias-corrected MC, scalable GP approximations).
- OOD-aware methods, information bottlenecks, and domain-structured UQ for scientific ML (Guo et al., 2023, He et al., 2023).
- Joint epistemic–aleatoric inference with single-pass or evidential frameworks, eliminating the need for expensive post-hoc sampling or ensembles (Lehmann et al., 2024, Guo et al., 2023).
- Robust UQ validation metrics integrating adaptivity, interval-based and variance-based calibration (Pernot, 2023).
- Inverse UQ frameworks for parameter calibration, embracing model bias, data assimilation, and surrogate acceleration for domain-specific physics models (Wu et al., 2021).
- Integration of UQ and explainability, especially in deep models, remains largely open (He et al., 2023).
The state-of-the-art in UQ now includes a mature ecosystem of methodologies tailored to the statistical, computational, and domain-contextual needs of modelers in engineering, computational science, and machine learning. Each method’s theoretical scope, computational constraints, and calibration properties must be assessed with respect to the problem setting, available computational budget, and UQ objectives.