Papers
Topics
Authors
Recent
2000 character limit reached

Polynomial Chaos-Kriging Surrogates

Updated 22 December 2025
  • PCK is a hybrid surrogate modeling technique that combines global polynomial chaos expansions with local Gaussian process regression to provide high-fidelity predictions and uncertainty quantification.
  • It leverages sparse polynomial basis selection via LAR and robust kernel fitting to effectively model high-dimensional, noisy, or non-smooth systems while avoiding overfitting.
  • PCK surrogates enable efficient Bayesian inference, global optimization, and risk analysis, offering significant speedups and accurate uncertainty estimates in complex simulations.

Polynomial Chaos–Kriging (PCK) is a hybrid surrogate modelling methodology that combines the global approximation capability of polynomial chaos expansions (PCE) with the local fidelity and uncertainty quantification of universal Kriging (UK) or Gaussian process (GP) regression. PCK surrogates arise in contexts where computationally expensive models—for example, stochastic simulators in engineering, physics-based forward models in planetary science, or optimization routines in wind energy layout—must be emulated for tasks such as uncertainty quantification, Bayesian inference, and efficient global optimization. The PCK paradigm has undergone significant refinement since its introduction, with advancements in sparse polynomial basis selection via least angle regression (LAR), robust kernel fitting, and algorithmic scalability for high-dimensional, noisy, or non-smooth problems (García-Marino et al., 5 Feb 2024, Schoebi et al., 2015, Wringer et al., 19 Dec 2025, Palar et al., 2018, Lee et al., 2022, García-Merino et al., 2022, Shao et al., 16 Feb 2025).

1. Mathematical Formulation

A PCK surrogate for a computational model M(x)\mathcal{M}(x) is written as

Y^(x)=αAaαΨα(x)PCE trend+Z(x)Gaussian-process residual\widehat{Y}(x) = \underbrace{\sum_{\alpha\in\mathcal{A}} a_\alpha \Psi_\alpha(x)}_{\text{PCE trend}} + \underbrace{Z(x)}_{\text{Gaussian-process residual}}

where:

  • (Ψα(x))(\Psi_\alpha(x)) is a multi-variate polynomial basis, orthonormal with respect to the input probability measure.
  • A\mathcal{A} is a sparse truncation set of multi-indices, often selected by total-degree or hyperbolic-norm rules and adaptively pruned.
  • aαa_\alpha are trend coefficients estimated via regression.
  • Z(x)GP(0,σ2R(x,x;θ))Z(x)\sim\mathcal{GP}(0, \sigma^2 R(x, x'; \theta)) is a zero-mean GP with process variance σ2\sigma^2 and kernel hyperparameters θ\theta (e.g., Gaussian or Matérn).

This structure maximizes "global" approximation through the PCE trend and "local" correction via the GP residual. The covariance kernel, R(x,x;θ)R(x,x';\theta), is typically chosen to accommodate the smoothness or roughness of the model output and is fitted by maximizing either a log-likelihood or leave-one-out cross-validation error (García-Marino et al., 5 Feb 2024, Schoebi et al., 2015, Wringer et al., 19 Dec 2025, Palar et al., 2018).

2. Sparse Polynomial Basis Selection and Model Training

Crucial to PCK effectiveness is sparse selection of the polynomial trend terms, achieved via Least Angle Regression (LAR), LASSO, or similar techniques:

  • All candidate polynomials up to a user-specified maximal degree pp and qq-norm (often via a hyperbolic truncation) are generated.
  • LAR iteratively adds the polynomial basis most correlated with the current residual to the active set, updating coefficients via regression and evaluating generalization via leave-one-out (LOO) error minimization.
  • The optimal subset A\mathcal{A}^* is selected at the point of minimal LOO-CV error, avoiding overfitting inherent to full PCE bases (García-Marino et al., 5 Feb 2024, Schoebi et al., 2015).
  • Model hyperparameters (θ,σ2,β)(\theta, \sigma^2, \beta) for the GP are set by maximizing the log-likelihood

(β,σ2,θ)=12logΣZ+Σε12(MˉΨβ)(ΣZ+Σε)1(MˉΨβ)\ell(\beta,\sigma^2,\theta) = -\frac{1}{2}\log|\Sigma_Z+\Sigma_\varepsilon| -\frac{1}{2}(\bar{\mathcal M}-\Psi\beta)^\top(\Sigma_Z+\Sigma_\varepsilon)^{-1}(\bar{\mathcal M}-\Psi\beta)

where ΣZ\Sigma_Z and Σε\Sigma_\varepsilon respectively capture the process covariance and noise at the design points.

3. Prediction and Uncertainty Quantification

For a new input xx^*, the PCK prediction and its uncertainty are given as:

Y^(x)=Ψ(x)β^+rZ(x)(ΣZ+Σε)1(MˉΨβ^)\widehat{Y}(x^*) = \Psi(x^*)^\top\hat\beta + r_Z(x^*)^\top (\Sigma_Z+\Sigma_\varepsilon)^{-1} (\bar{\mathcal M} - \Psi\,\hat\beta)

MSE^(x)=σ2R(x,x)(ΣZ+Σε)1R(x,)\widehat{\mathrm{MSE}}(x^*) = \sigma^2 - R(x^*,x^*)^\top (\Sigma_Z+\Sigma_\varepsilon)^{-1} R(x^*,\cdot)

where rZ(x)=[R(x,x(i))]r_Z(x^*) = [R(x^*, x^{(i)})]^\top quantifies local correlation with training data. Uncertainty in PCK is rigorously decomposed between extrinsic sources (from Z(x)Z(x)) and intrinsic noise (from measurement uncertainty, modeled by Σε\Sigma_\varepsilon) (García-Marino et al., 5 Feb 2024, Schoebi et al., 2015).

4. Integration into Bayesian Inference, Optimization, and Reliability

PCK surrogates are widely integrated in advanced workflows:

  • Bayesian inference: Within Markov chain Monte Carlo (MCMC), the expensive forward model is replaced by a PCK surrogate, yielding dramatic computational speedup (e.g., factor \sim320 in exoplanet interior characterization). Surrogate error is statistically propagated by inflating the likelihood variance (Wringer et al., 19 Dec 2025, García-Merino et al., 2022).
  • Efficient Global Optimization (EGO): The Expected Improvement (EI) acquisition function is computed using both the mean and variance from the PCK surrogate. Automatic trend selection via LARS/LOO tends to outperform both pure Kriging and blind polynomial selection when the underlying response exhibits moderate polynomial structure, but in highly rugged cases, a constant trend may be preferable (Palar et al., 2018, Shao et al., 16 Feb 2025).
  • Reliability and risk: In high-dimensional settings, dimensionally decomposed PCEs merged with Kriging (DD-GPCE-Kriging) enable scalable estimation for quantities like Conditional Value-at-Risk (CVaR), achieving up to 10410^4-fold speedups via multifidelity importance sampling (Lee et al., 2022).

5. Domain Partitioning and High-Dimensional Extensions

For non-smooth, highly non-linear, or high-dimensional models:

  • Multielement PCK: The input space is partitioned into JJ non-overlapping subdomains; independent PCK surrogates are fit locally and assembled piecewise. This approach maintains local adaptation capabilities—fitting sharp transitions and discontinuities—while balancing computational tractability via efficient domain allocation (García-Merino et al., 2022).
  • Dimensionally decomposed PCEs: Basis reduction is achieved by restricting polynomial interactions to at most SS-variate terms, dramatically decreasing the number of basis functions and enabling applications with N20N\geq20 dimensions while maintaining accuracy and computational tractability (Lee et al., 2022).

6. Empirical Performance and Validation Benchmarks

Empirical studies across a range of disciplines quantify the advantages of PCK:

Case Study PCK vs. SK/Kriging Surrogate Error Computational Speedup
Stochastic queue (M/M/1), Egg-box, Ishigami RMSE improvement: 20–74%; NMAE: 8–60% RMSE \sim 0.5–1% 1–2 orders of magnitude
Exoplanet inversion R2>0.99R^2 > 0.99; coverage: 93–96% Error \ll data uncertainty \sim320×
Wind farm layout optimization R² > 0.99; out-of-sample RMSE < 0.5% Sub-percent RMSE 10–500×

These findings consistently show that sparse polynomial trend selection (LAR, LASSO) is critical: full PCEs tend to overfit, while sparse, cross-validated selection ensures robust accuracy and generalization. PCK is especially advantageous when experimental design size is limited or intrinsic noise is significant (García-Marino et al., 5 Feb 2024, Schoebi et al., 2015, Wringer et al., 19 Dec 2025, Palar et al., 2018, Shao et al., 16 Feb 2025).

7. Practical Guidelines and Implementation Notes

Best practices for deploying PCK surrogates include:

  • Always apply LAR or LASSO to candidate polynomials for trend selection; full PCE basis should generally be avoided (García-Marino et al., 5 Feb 2024, Schoebi et al., 2015).
  • For high-dimensional applications, impose total-degree or dimensionally decomposed truncations and use space-filling experimental designs (e.g., Latin Hypercube Sampling).
  • Kernel selection should reflect the smoothness and structure of Z(x)Z(x); Gaussian is default for smooth models, Matérn for rough behaviour, and a nugget is advisable with intrinsic noise (Wringer et al., 19 Dec 2025, Palar et al., 2018).
  • Surrogate accuracy must be validated on a large hold-out sample via root-mean-square error (RMSE), normalized mean absolute error (NMAE), and confidence interval coverage rates. For reliability studies, integration into multifidelity sampling frameworks is recommended (Lee et al., 2022, García-Merino et al., 2022).
  • Hyperparameters in universal Kriging are optimally set via maximum likelihood—which often necessitates global optimization (genetic algorithms, BFGS) for non-convex log-likelihoods (García-Marino et al., 5 Feb 2024, Schoebi et al., 2015).
  • Surrogate uncertainty should be propagated in downstream tasks by explicit variance inflation, ensuring statistically robust inference and optimization convergence (Wringer et al., 19 Dec 2025, García-Merino et al., 2022).

References to Core Developments

Polynomial Chaos–Kriging therefore constitutes a robust, statistically principled, and efficient framework for surrogate modelling of expensive, stochastic, and high-dimensional simulators, delivering high-fidelity predictions and calibrated uncertainties with modest experimental designs and scalable computational cost.

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Polynomial Chaos-Kriging (PCK) Surrogate.