Physics-Informed CoKriging
- Physics-Informed CoKriging is a multifidelity surrogate modeling technique that integrates physics-based low-fidelity simulations with limited high-fidelity data to yield physically consistent predictions.
- It employs an autoregressive Gaussian process framework where empirical priors from Monte Carlo simulations reduce hyperparameter tuning and systematically correct model biases.
- Applications span redox flow batteries, chaotic PDEs, and heat transfer, achieving high accuracy while quantifying physical-law adherence through explicit error bounds.
Physics-informed CoKriging (CoPhIK) is a rigorous multifidelity surrogate modeling framework that fuses physical knowledge from computational models, typically in the form of low-fidelity stochastic simulators, with sparse high-fidelity observations using Gaussian-process-based CoKriging. By embedding partial physics or conservation laws directly into the prior statistics of the Gaussian process, CoPhIK systematically achieves improved data-model convergence, reduces hyperparameter search, and preserves the physical structure of the system up to quantifiable error bounds. The framework generalizes across applications in uncertainty quantification, inverse problems, and scientific machine learning, demonstrating robust accuracy even when the underlying physics models are misspecified.
1. Mathematical Foundation of Physics-Informed CoKriging
The canonical CoPhIK methodology is grounded in the autoregressive multifidelity modeling paradigm. Given a low-fidelity stochastic simulator (with parameter ) and a limited set of high-fidelity data , the goal is to construct a high-fidelity surrogate that leverages both data modalities.
The core autoregressive relation is
where:
- is the low-fidelity process, modeled as a Gaussian process .
- is a discrepancy process, , independent of .
- is a scaling parameter.
Crucially, in CoPhIK and its bifidelity extensions, the low-fidelity mean 0 and covariance 1 are not generic forms with tunable hyperparameters, but are estimated directly from a (potentially large) ensemble of stochastic physics-based simulations. This makes the process "physics-informed" as it decouples GP prior specification from data-intensive maximum-likelihood training.
Given training data 2 (at locations 3), the block covariance and prior mean are
4
where 5 and 6 are kernel matrices.
The posterior mean and variance at a test point 7 are given by
8
9
Only the discrepancy kernel and 0 typically require hyperparameter optimization, often by marginal likelihood maximization (Yang et al., 2018, Yang et al., 2018, Howard et al., 2021).
2. Construction of Physics-Informed Low-Fidelity Priors
The construction of physics-informed low-fidelity priors distinguishes CoPhIK from conventional multifidelity GPR methods. For a stochastic simulator 1 sampled over parameters 2, the empirical mean and covariance are:
3
where 4 denotes the 5-th low-fidelity realization.
For settings where direct high-fidelity simulation is computationally infeasible, a bifidelity (approximation-theory-based) approach can be utilized. Here, a small subset 6 of high-fidelity runs, identified via a pivoted Cholesky or greedy selection on the low-fidelity Gram matrix, is used to construct an interpolation mapping 7. This yields "bifidelity" samples
8
from which 9, 0 are estimated as above by replacing 1 with 2 (Yang et al., 2018).
This bifidelity embedding enables the low-fidelity GP to capture salient features of the high-fidelity manifold with minimal direct computational cost.
3. Enforcement and Quantification of Physical Constraints
A distinctive feature of physics-informed CoKriging is the explicit quantification of adherence to physical constraints. Consider a deterministic linear operator 3 (e.g., Laplacian, mass conservation), and suppose each high-fidelity realization satisfies 4.
Theorem 2.3 in (Yang et al., 2018) states the predictive posterior mean 5 (from bifidelity-accelerated CoPhIK) satisfies
6
where 7 quantifies the MC error of physical residuals, 8, 9 bound bifidelity approximation error, 0 is the operator norm, and 1 is the empirical standard deviation at observation locations.
Thus, the model "inherits" the physical law up to explicit, data-driven error bounds, with no requirement for exact physical model fidelity (Yang et al., 2018, Yang et al., 2018).
4. Algorithmic Workflow and Training Procedures
The practical implementation of CoPhIK involves:
- Sampling the Low-Fidelity Model: Generate a large Monte Carlo ensemble from the physics-based model (e.g., 0D battery ODEs (Howard et al., 2021), stochastic PDE (Yang et al., 2018), or spectral discretizations (Yang et al., 2018)).
- Constructing the Empirical GP: Compute 2 and 3 by empirical averaging. In the bifidelity regime, use 4 as constructed above.
- CoKriging Model Assembly: Specify the two-level GP structure:
- Low-fidelity: 5
- Discrepancy: 6, often 7 stationary, 8 typically zero.
- High-fidelity: 9
- Hyperparameter Fitting: Estimate 0, 1 (and 2 if not set to zero) via marginal likelihood maximization on the residuals 3.
- Prediction: Given a new input, evaluate the predictive mean and variance via the closed-form CoKriging formulas.
- Active Learning (Optional): Sequentially select new high-fidelity points for maximal variance reduction using the current GP posterior predictive variance (Yang et al., 2018).
This approach is efficient: empirical statistics from thousands of MC simulations define the prior, and only a low-dimensional discrepancy kernel requires optimization.
5. Illustrative Applications and Performance Analysis
Surrogate Modeling for Physical Systems
- Redox Flow Batteries: CoPhIK models the charge-discharge curve using a zero-dimensional ODE as the low-fidelity source and lab experiments as high-fidelity data. With 4 low-fidelity samples and as few as 1-3 high-fidelity points per cycle, CoPhIK achieves 5 errors as low as 6–7 V and 8, outperforming both physics-only and data-only GPR across varied parameterizations (Howard et al., 2021).
- Branin Function and Heat Transfer: For the stochastic Branin function and steady-state heat transfer PDEs, bifidelity-accelerated CoPhIK attains relative errors of 9–0, nearly matching standard CoPhIK at a fraction of high-fidelity cost, while standard Kriging errors remain 1–2 (Yang et al., 2018).
- Kuramoto–Sivashinsky Equation: For chaotic PDEs, bifidelity CoPhIK provides the best overall waveform fit, although with higher error (3) due to large bifidelity approximation error 4 (Yang et al., 2018).
Sensitivity and Robustness
CoPhIK is robust to low-fidelity model misspecification. Even substantial (order-of-magnitude) errors in the low-fidelity physical parameters are compensated by the data-driven discrepancy process, ensuring predictive accuracy and physical consistency (Howard et al., 2021).
Efficiency
Empirical construction of the low-fidelity GP eliminates the need for expensive kernel hyperparameter search in 5. Training CoPhIK surrogates, even with large MC-generated prior statistics, requires seconds to minutes and no deep learning pretraining (Yang et al., 2018, Howard et al., 2021).
Table: Summary of Representative Application Results
| Scenario | High-Fi Points | CoPhIK Error | Kriging Error |
|---|---|---|---|
| Redox Flow Battery (Howard et al., 2021) | 1–3 per cycle | 6–7 V | 80.01 V (data-only) |
| Stoch. Branin (Yang et al., 2018) | 8 | 9–0 | 1 |
| Heat Transfer (Yang et al., 2018) | 6–14 | 25% | 327% |
6. Extensions and Related Physics-Informed Kriging Approaches
Recent developments extend the physics-informed Kriging paradigm to other domains and learning architectures. Examples include:
- Physics-Guided Increment Training (PGITS): For spatio-temporal kriging in air quality inference, PGITS integrates physics (advection-diffusion PDE) into graph convolutional structures and loss functions. The dynamic graph generation module fuses diffusion and advection kernels, and the overall loss combines supervised, pseudo-label, and physical continuity constraints (Yang et al., 12 Mar 2025). This generalizes the idea of physics-informed priors to implicit-graph structures in deep learning contexts.
- Active Learning and Experimental Design: CoPhIK enables efficient active-data selection by maximizing the GP posterior variance, accelerating convergence to target prediction accuracy with minimal high-fidelity samples (Yang et al., 2018).
- Preservation of Physics: Both empirical and theoretical results confirm that CoPhIK posterior means satisfy linear physical laws up to explicit error bounds, dominated by MC error and approximation error intrinsic to the bifidelity embedding (Yang et al., 2018, Yang et al., 2018).
7. Common Misconceptions and Limitations
A frequent misconception is that the quality of the CoPhIK surrogate is strictly limited by the fidelity of the physics model. In practice, the presence of the data-driven discrepancy GP allows significant correction for systematic bias: even largely misspecified low-fidelity priors can yield highly accurate predictions when sufficient (but still sparse) high-fidelity data are available (Howard et al., 2021).
However, the bifidelity acceleration may incur significant approximation error (4) if the mapping from low to high-fidelity is highly nonlinear or if the selected high-fidelity snapshots are not representative. In such scenarios, accuracy is slightly degraded relative to the full (non-bifidelity) approach, as observed in the Kuramoto–Sivashinsky example (Yang et al., 2018).
Finally, the framework as presently formulated is most theoretically robust for linear physical operators and linear autoregressive relations, with generalization to nonlinear physics and deep-learning kernels an ongoing area of active research.
Key References:
- "Physics-Informed CoKriging: A Gaussian-Process-Regression-Based Multifidelity Method for Data-Model Convergence" (Yang et al., 2018)
- "When Bifidelity Meets CoKriging: An Efficient Physics-Informed Multifidelity Method" (Yang et al., 2018)
- "Physics-informed CoKriging model of a redox flow battery" (Howard et al., 2021)
- "Inductive Spatio-Temporal Kriging with Physics-Guided Increment Training Strategy for Air Quality Inference" (Yang et al., 12 Mar 2025)