Uncertainty Quantification Benchmark
- Uncertainty Quantification Benchmark is a standardized framework that defines and compares UQ techniques using realistic datasets and performance metrics.
- It evaluates methods such as aPC, adaptive sparse grids, kernel-based interpolation, and hybrid stochastic Galerkin to balance accuracy and computational efficiency.
- The benchmark employs practical test scenarios like CO₂ storage to rigorously assess uncertainty propagation and guide the selection of suitable UQ approaches.
Uncertainty quantification (UQ) benchmarks are standardized scenarios, datasets, or methodological frameworks used to rigorously compare, evaluate, and guide the selection of UQ techniques in computational science, engineering, and applied machine learning. These benchmarks provide reference solutions, well-defined performance metrics, and realistic problem formulations that reflect the typical sources, structures, and consequences of uncertainty in modeling and data-driven inference. In the context of subsurface flow, for example, UQ benchmarks are indispensable for assessing the predictive and computational characteristics of alternative non-intrusive and intrusive UQ methods, especially where the available observational data does not permit the construction of precise parametric probability distributions.
1. Key UQ Methods and Their Operational Principles
Uncertainty quantification in complex simulation scenarios relies on a range of surrogate modeling and stochastic discretization techniques, each with distinct theoretical foundations and computational trade-offs. The main types evaluated in benchmark studies of CO₂ storage (Köppel et al., 2018) include:
- Arbitrary Polynomial Chaos (aPC): Expands the model output as a sum of multivariate orthonormal polynomial basis functions constructed from statistical moments of the empirical input distribution. The representation takes the form
Two non-intrusive computational strategies are common: minimally sampling using probabilistic collocation (PCM) for efficiency, and a full tensor grid with least-squares fitting to reduce oscillatory artifacts in higher dimensions.
- Spatially Adaptive Sparse Grids: Constructs surrogates in high-dimensional parameter space using hierarchical local basis functions, with refinement targeted in regions of high local error (e.g., as indicated by weighted norms). Adaptive sparse grids counter exponential scaling in the number of samples (the "curse of dimensionality") and can place grid points both at boundaries and in the domain interior using linear extrapolation.
- Kernel-Based Greedy Interpolation: Builds sparse, data-driven surrogates by selecting a quasi-optimal set of “center” points—using, for instance, the power function as an informativeness criterion—to interpolate the output with compactly supported kernels such as the Wendland kernel. The approximation is , with coefficients fitted via interpolation and sample point selection driven by the Vectorial Kernel Orthogonal Greedy Algorithm (P-VKOGA).
- Hybrid Stochastic Galerkin (HSG): An intrusive strategy in which the governing PDEs (e.g., hyperbolic transport equations) are projected onto tailored polynomial chaos expansions within a partitioned stochastic domain (“multi-element” decomposition). The expansion is
where are locally supported polynomial bases and solution coefficients are obtained by solving a coupled deterministic system over all elements.
2. Structure of the Benchmark Scenario and Governing Formulation
A prototypical UQ benchmark for geoscience applications is constructed using a simplified physical model with problem parameters derived from credible site data. In the CO₂ storage benchmark (Köppel et al., 2018):
- Physical setting: The injection of CO₂ into a saline aquifer is governed by the nonlinear, capillarity-free fractional flow formulation for two incompressible phases, reduced to a radial, one-dimensional representation near the well.
- Pressure equation: The pressure profile, supporting deterministic solution, is
where is permeability and depends on uncertain boundary conditions.
- Transport equation: The saturation is propagated using a central-upwind finite volume method adapted to the radial coordinate.
- Input data: Model parameters are physically plausible (e.g., from site databases), with sufficient spatial and temporal discretization (e.g., 250 cells, 10,000 Monte Carlo samples) to yield converged moment estimates for reference.
3. Sources and Modeling of Uncertainty
The benchmark deliberately incorporates multiple realistic sources of parametric uncertainty, each encoded as an independent random variable:
Source | Mathematical Representation | Impact on Model |
---|---|---|
Boundary conditions | , with explicit in | Variable injection rate |
Conceptual model | Nonlinearity in relative permeability | |
Material properties | Reservoir porosity variability |
All uncertain parameters are propagated via their empirically estimated distributions to ensure realism, with the full reference solution constructed from the ensemble of Monte Carlo samples.
4. Performance Metrics and Comparative Criteria
Accurate benchmarking necessitates rigorous, interpretable summary metrics:
- Expectation (Mean): The spatial and temporal mean of CO₂ saturation is computed by each UQ method and compared to the Monte Carlo reference.
- Standard Deviation (Variance): The second moment is quantified, revealing the predicted spread of saturation as a function of space and time.
- Convergence Analysis: For surrogate models, error decay is plotted as a function of the number of full model runs (or grid resolution); for HSG, accuracy is tracked vs. polynomial order and element count.
- Efficiency and Scalability: The computational burden—measured in model evaluations and cost to reconstruct predictions/uncertainties—is compared across approaches, yielding practical guidance for modelers.
5. Advantages, Disadvantages, and Practical Implementation Guidance
The benchmark exposes the trade-offs inherent in each method:
Method | Advantages | Disadvantages |
---|---|---|
aPC | Efficient for low-order, low-dimension | Prone to oscillations, global basis sensitivity |
Sparse grids | Adaptivity, high-dimension scalability | Complexity in grid refinement, boundary point placement |
Kernel greedy | Sparse, fast surrogate; quasi-optimal convergence | Needs minimal #samples; can face slow convergence |
Hybrid Galerkin | Full intrusive statistics, postprocessing flexibility | Solver modification, curse of dimensionality |
A key conclusion is that low-order aPC (or low-resolution HSG) may suffice for rough moment estimates, but the accurate resolution of standard deviation—especially near discontinuities—favors adaptive approaches like sparse grids or kernel-based interpolation. For high-dimensional or highly nonlinear problems, surrogate models with adaptive refinement are generally preferable.
6. Analytical Formulation and Key Mathematical Expressions
The benchmark formalizes each UQ method with explicit expansions:
- aPC expansion:
- Sparse grid surrogate:
- Kernel interpolant:
- Error bound for kernel interpolation:
- HSG expansion:
These explicit representations are critical for implementation and benchmarking, as they directly govern both model accuracy and computational requirements.
7. Recommendations for Modelers and Benchmarking Best Practices
The benchmark paper yields the following guidance:
- Select aPC (with PCM) or low-resolution HSG for simple, low-cost UQ in low dimensions and when only means are needed.
- Use adaptive sparse grids or kernel-based greedy surrogates when accurate uncertainty quantification (including second moments and local features) is required or when faced with higher-dimensional parameter spaces.
- Use intrusive (Galerkin-type) methods when full probabilistic reformulation is needed and postprocessing flexibility is a priority, but be aware of significant code modifications and increased computational burden.
- Match the computational budget to the demands of the required output accuracy and consider the presence of discontinuities or sharp fronts, as global basis expansions face accuracy breakdown in such cases.
- For CO₂ storage and analogous settings with limited data for parameter distribution estimation, prefer UQ methods that are robust to distributional misspecification and efficiently capture empirical uncertainty propagation.
In summary, rigorous UQ requires benchmarks that mimic true operational uncertainty and allow for exhaustive comparison of methods on accuracy, efficiency, and scalability. The CO₂ storage benchmark (Köppel et al., 2018) exemplifies such a standard by combining physically motivated parametrizations, multiple uncertainty sources, and a framework that allows head-to-head assessment of both intrusive and non-intrusive UQ methodologies.