Deep Learning Surrogate Modeling

Updated 4 July 2026

Deep learning-based surrogate modeling is the process of using neural networks to approximate expensive numerical simulations while preserving key spatial and temporal structures.
It encompasses a range of methods—from convolutional field surrogates and latent-dynamics models to probabilistic and geometry-aware networks—tailored to various application needs.
Integration with physics, multifidelity learning, and active learning strategies enhances simulation efficiency, uncertainty quantification, and overall operational performance.

Deep learning-based surrogate modeling is the construction of neural approximations to expensive simulators, reduced-order models, or full-order numerical solvers, with the surrogate learned either as a direct map between simulator inputs and outputs, as a latent dynamical system, as a conditional generative model, or as a physics-constrained field representation (Tang et al., 2020). Across the literature, the term encompasses deterministic emulators for pressure, saturation, temperature, displacement, and stress fields; probabilistic surrogates that learn conditional distributions rather than point predictions; and geometry-aware or discretization-independent models that operate on meshes, graphs, or continuous coordinates instead of fixed grids (Shen et al., 2024). The unifying objective is to replace repeated high-fidelity evaluations in many-query workflows such as uncertainty quantification, design exploration, optimization, and data assimilation, while preserving the dominant spatial, temporal, and parametric structure of the underlying system (Liu et al., 13 Apr 2026).

1. Scope, problem classes, and formal definitions

Deep learning surrogates appear in the cited literature for several distinct mathematical settings. In subsurface flow, a common formulation is a forward map from static geology or permeability fields to dynamic pressure and saturation fields, sometimes with additional outputs such as surface displacement or well responses (Han et al., 2024). In scientific visualization and parameter-space exploration, the target is instead a conditional distribution over high-dimensional simulation outputs given a low-dimensional parameter vector, written in latent form as $p(z \mid c)$ and decoded to volumetric fields (Shen et al., 2024). In stochastic simulation, the surrogate targets the full conditional distribution $p(y \mid x)$ rather than a single regression function, using exogenous latent noise or RKHS-based distribution matching (Thakur et al., 2021).

A second distinction is between field surrogates and quantity-of-interest surrogates. Field surrogates predict spatial or spatiotemporal states, such as 3D pressure and saturation over multiple timesteps, 2D thermal plumes, or functional outputs over tens of thousands of spatial locations (Tang et al., 2020). Quantity-of-interest surrogates predict lower-dimensional derived observables such as activated slip area, average fault slip, strain energy, or total transfer capability, often because these observables are the sufficient inputs to downstream inversion or operational planning (Millevoi et al., 2024). This suggests that surrogate complexity is strongly conditioned by the dimensionality and structure of the target space.

The literature also distinguishes data-driven surrogates from physics-constrained surrogates. In purely supervised settings, the model is trained on solver-generated pairs, for example $\hat{d}=f_\theta(m)$ with $m$ denoting simulator inputs and $d$ the quantities of interest (Liu et al., 13 Apr 2026). In label-free or weak-data settings, the network may instead be trained by minimizing PDE residuals and enforcing boundary conditions analytically, as in parametric incompressible flow surrogates that use automatic differentiation and structured ansätze rather than simulation labels (Sun et al., 2019). A plausible implication is that “deep learning-based surrogate modeling” no longer denotes a single methodology but a family of approximations whose defining characteristic is the replacement of repeated numerical solves by learned operators, learned dynamics, or learned conditional distributions.

2. Representation strategies and model classes

A useful taxonomy separates surrogate models by how they represent spatial structure, temporal evolution, geometry, and uncertainty. Convolutional encoder–decoder models dominate regular-grid field prediction. In subsurface flow, recurrent residual U-Nets combine 3D convolutions, skip connections, residual blocks, and ConvLSTM bottlenecks to predict pressure and saturation over time from geological inputs (Tang et al., 2020). Related architectures appear in carbon-storage surrogates, where recurrent residual U-Nets are used for saturation and pressure and a separate residual U-Net is introduced for surface displacement (Han et al., 2024). U-Net variants also appear in groundwater heat-pump thermal plume prediction, conjugate heat transfer, and early-stage IR-drop estimation, all framed as dense field-to-field regression on structured grids (Davis et al., 2023).

Latent-dynamics surrogates constitute a second major class. The multi-step embed-to-control model replaces one-step latent transitions with a Koopman-inspired lifted linear evolution in latent space, using an encoder $Q_\phi$ , a lifting map $\mathrm{enc}(\cdot)$ , a linear operator $K=[A,B]$ , and a decoder $P_\theta$ (Chen et al., 2024). Its defining recurrence,

$\Phi_{e,t+1}=A\Phi_{e,t}+Bu_t,$

is trained over multiple forward transitions, with a $p(y \mid x)$ 0-step consistency loss in the lifted space rather than only adjacent-state reconstruction (Chen et al., 2024). This explicitly addresses error accumulation in long rollouts.

Probabilistic surrogates adopt generative formulations. SurroFlow uses an autoencoder plus a conditional normalizing flow to model

$p(y \mid x)$ 1

and therefore supports forward prediction, uncertainty quantification, and reverse prediction through exact invertibility (Shen et al., 2024). Earlier conditional generative formulations express the output as $p(y \mid x)$ 2 with latent noise $p(y \mid x)$ 3, trained adversarially or by implicit variational inference to represent non-Gaussian and multimodal conditional distributions (Yang et al., 2019). A different probabilistic route appears in stochastic surrogates trained with conditional maximum mean discrepancy, which compare conditional distributions in RKHS rather than fitting only moments or point estimates (Thakur et al., 2021).

Geometry-aware surrogates depart from fixed-grid CNNs. Discretization-independent models based on design-variable hypernetworks, dual systems, and implicit representations evaluate solutions continuously at arbitrary coordinates and encode geometry through a minimum-distance or signed-distance function (Duvall et al., 2021). Mesh-based time-steppers instead treat the solution as node features on a graph and evolve it with message passing, permitting generalization across meshes, geometries, and resolutions (Franco et al., 2023). Functional-output surrogates provide another alternative: DeepSurrogate factorizes a spatially indexed response as

$p(y \mid x)$ 4

with separate neural networks learning basis functions in the input space and spatial coefficient functions over the output domain (Jeon et al., 26 Mar 2025).

Model class	Core mechanism	Representative papers
Convolutional field surrogate	Encoder–decoder, skip connections, residual blocks, recurrent bottlenecks	(Tang et al., 2020, Han et al., 2024, Ebbs-Picken et al., 2023)
Latent-dynamics surrogate	Encoder–Koopman–decoder or locally linear latent transition	(Chen et al., 2024)
Flow-based probabilistic surrogate	Autoencoder + conditional normalizing flow	(Shen et al., 2024)
Physics-constrained surrogate	PDE residual minimization with hard IC/BC enforcement	(Sun et al., 2019)
Geometry/mesh-aware surrogate	Hypernetworks, implicit coordinates, or GNN message passing	(Duvall et al., 2021, Franco et al., 2023)
Functional-output surrogate	Learned basis functions and spatial coefficients	(Jeon et al., 26 Mar 2025)

This range of representations shows that surrogate modeling has expanded from fixed-grid regression to operator-like constructions that are conditional, stochastic, and geometry-aware. It also suggests that architectural choice is not incidental: it encodes assumptions about grid structure, time dependence, invertibility, or mesh variability.

3. Training data, multifidelity learning, and online data generation

The dominant training regime remains supervised learning on solver-generated data, but the literature presents several strategies to reduce the cost of label generation. One approach is multifidelity transfer learning. In 3D two-phase subsurface flow, most training simulations can be performed on coarsened geomodels obtained through flow-based upscaling, while only a small number of high-fidelity runs are used to retrain the output head and fine-tune the full recurrent residual U-Net (Jiang et al., 2022). The reported configuration uses 2500 low-fidelity runs and 200 high-fidelity runs, yielding about a 90% reduction in training simulation costs while remaining nearly as accurate as a reference surrogate trained on 2500 high-fidelity runs (Jiang et al., 2022).

A related but distinct strategy is hybrid physics decomposition. For geological CO $p(y \mid x)$ 5 storage, saturation and pressure surrogates are trained largely on inexpensive flow-only simulations that use an effective rock compressibility, while surface-displacement surrogates are trained on a much smaller set of coupled flow–geomechanics simulations and consume surrogate-predicted saturation and pressure as inputs (Han et al., 2024). The paper reports that 4000 flow-only plus 200 coupled runs have a total training-simulation cost equivalent to about 467 coupled runs, compared with a fully coupled strategy requiring thousands of coupled realizations (Han et al., 2024). This suggests that multifidelity need not mean simple mesh coarsening; it can also mean replacing a coupled multiphysics system with a calibrated reduced-physics approximation for part of the surrogate stack.

Large-scale online learning addresses a different bottleneck: storage and I/O rather than label fidelity. Instead of generating a static dataset on disk, solver instances stream trajectories directly into a distributed training server, eliminating storage bottlenecks and exposing the model to substantially greater sample diversity (Meyer et al., 2023). Across PDE surrogates, this online regime improved accuracy for fully connected networks, Fourier Neural Operators, and message-passing PDE solvers, with reported gains of 68.9%, 15.6%, and 6.02% in RMSE-relative comparisons between offline and online settings for selected benchmarks (Meyer et al., 2023). The same work explicitly identifies intra-simulation, inter-simulation, and memory biases in online learning and mitigates them through parameter sampling, random-read buffers, and watermarking (Meyer et al., 2023).

Data efficiency can also be improved by active learning. The $p(y \mid x)$ 6-weighted Hybrid Query Strategy trains a student regression surrogate and a teacher classifier that predicts where the surrogate is likely to fail, then mixes exploitation and exploration in the next batch selection (Vardhan et al., 2022). In finite-element stress analysis, the reported method reaches the same accuracy as random sampling with 57.5% fewer samples, corresponding to approximately 250 hours of simulator time saved in the stated setting (Vardhan et al., 2022). A plausible implication is that data-generation cost is now treated as a first-class design variable in surrogate modeling, not merely as a fixed prerequisite.

4. Uncertainty quantification, stochastic surrogates, and invertibility

A persistent misconception is that deep surrogate models are necessarily deterministic regressors. Several of the cited works are explicitly probabilistic. SurroFlow replaces a deterministic map $p(y \mid x)$ 7 by a conditional density model in latent space, trained with exact flow likelihood and an auxiliary reverse-prediction loss,

$p(y \mid x)$ 8

where $p(y \mid x)$ 9 and $\hat{d}=f_\theta(m)$ 0 (Shen et al., 2024). Because the flow is invertible, the model supports forward generation, uncertainty estimation by repeated latent sampling, and reverse prediction of parameters from observed outputs (Shen et al., 2024).

Bayesian neural surrogates provide a second route to uncertainty. In ceramic aerogels, a Bayesian CNN with Bayes by Backprop and local reparameterization learns a predictive distribution over strain energy conditioned on microstructure images, while a separate WGAN-GP generates microstructure realizations (Islam et al., 22 Jan 2025). The predictive distribution

$\hat{d}=f_\theta(m)$ 1

is approximated by Monte Carlo weight sampling, and increased predictive variance is shown on out-of-distribution morphologies, especially the 22 $\hat{d}=f_\theta(m)$ 2m interpolation case (Islam et al., 22 Jan 2025). This is not merely a confidence heuristic; it is an explicit posterior predictive construction.

Monte Carlo dropout provides a lighter-weight approximation in functional surrogates. DeepSurrogate embeds dropout within a variational deep Gaussian process interpretation and draws predictive samples by repeatedly sampling network masks, producing means and 95% predictive intervals (Jeon et al., 26 Mar 2025). The reported SLOSH case gives RMSPE 0.90 and coverage 0.94, versus RMSPE 0.78 and coverage 0.80 for VecchiaGP, illustrating a familiar calibration–accuracy trade-off (Jeon et al., 26 Mar 2025).

Distributional training objectives also appear outside Bayesian formulations. Conditional maximum mean discrepancy is used to train a generative neural surrogate for stochastic simulators without assuming a parametric form for $\hat{d}=f_\theta(m)$ 3 (Thakur et al., 2021). Adversarial implicit variational inference offers another distribution-free construction in conditional generative surrogates for stochastic, high-dimensional, and multifidelity systems (Yang et al., 2019). Taken together, these works show that uncertainty quantification in deep surrogate modeling is not monolithic: it includes exact conditional likelihoods, Bayesian weight posteriors, dropout-based variational approximations, and kernel-based conditional distribution matching.

5. Physics integration, constraints, and domain structure

Physics enters deep surrogate modeling through several non-equivalent mechanisms. The strongest form is direct physics-constrained training. For incompressible fluid flows, a structured fully connected network enforces Dirichlet boundary conditions by construction,

$\hat{d}=f_\theta(m)$ 4

and minimizes mass and momentum residuals of the Navier–Stokes equations rather than using CFD labels (Sun et al., 2019). The same work argues that hard enforcement of boundary conditions is crucial in label-free training, whereas soft penalties produced poor solutions in the reported cases (Sun et al., 2019).

A weaker but widely used form is physics-aware architectural bias. In conjugate heat transfer, DeepEDH separates pressure, velocity, and temperature surrogates, applies output geometry masks to flow variables, and predicts temperature in two stages by conditioning the thermal surrogate on a first-stage velocity prediction (Ebbs-Picken et al., 2023). The temperature gain from this staged coupling is substantial in the reported results: temperature $\hat{d}=f_\theta(m)$ 5 increases from 0.9015 to 0.9372 with the two-stage design, before further improvement to 0.9413 after optimization (Ebbs-Picken et al., 2023). The method remains data-driven, but its architecture mirrors the convective dependence of the energy equation.

A third mechanism is regime awareness. Fault activation modeling shows that standard feedforward surrogates struggle with late-time regime changes associated with fault opening. The proposed SurMoDeL remedies this by adding a classifier that predicts the probability of fault opening and then either augments the surrogate input with a logical opening flag or blends the outputs of two regime-specific surrogates (Millevoi et al., 2024). The observable mapping

$\hat{d}=f_\theta(m)$ 6

then enables Bayesian inversion on seismic moment data (Millevoi et al., 2024). This suggests that, when the forward map is piecewise smooth rather than globally smooth, explicit regime decomposition may be more effective than simply enlarging the regression network.

Not all physics integration is equally strong. Several papers describe “physics-aware” losses or structures that do not impose PDE residuals or conservation laws explicitly. The multi-step embed-to-control model, for example, describes its loss redesign as respecting underlying physical principles, but the paper does not introduce explicit mass-conservation or PDE-residual penalties; the relevant constraint is multi-step consistency in a Koopman-lifted latent space (Chen et al., 2024). The distinction matters because it separates structural inductive bias from formal physical enforcement.

6. Applications, empirical performance, and operational embedding

The application range represented in the cited literature is broad. Subsurface flow remains a dominant domain. Recurrent residual U-Nets trained on 3D channelized geomodels provide accurate predictions of dynamic states, well responses, and flow statistics for new geological realizations, and are embedded into data assimilation workflows using rejection sampling and ensemble-based methods (Tang et al., 2020). In geological CO $\hat{d}=f_\theta(m)$ 7 storage, the hybrid flow-only/coupled surrogate stack achieves median relative errors below 4% for saturation, pressure, and displacement, with reported medians of 3.9%, 0.8%, and 2.6% in the final configuration (Han et al., 2024).

Other domains emphasize different output structures. For thermal plume prediction in groundwater heat pumps, a U-Net-like surrogate maps two-channel Darcy velocity fields on a 65×65 grid to the steady-state temperature field, producing outputs that follow streamlines and are described as orders of magnitude faster than PFLOTRAN-based simulation (Davis et al., 2023). In conjugate heat transfer for battery thermal management, optimized DeepEDH improves $\hat{d}=f_\theta(m)$ 8 relative to U-Net by 11% for pressure, 53% for velocity, and 65% for temperature on the reported benchmark (Ebbs-Picken et al., 2023). In vehicle aerodynamics and Poisson problems with variable geometry, design-variable hypernetworks outperform other discretization-independent models in accuracy while remaining competitive in time to best model (Duvall et al., 2021). In mesh-based advection–diffusion with geometric variability, GNN time-steppers yield speedups of approximately 15–70× over the full-order models in the reported 2D and 3D examples (Franco et al., 2023).

Surrogate models are increasingly embedded directly into inverse problems and control loops rather than treated as offline approximators. SurroFlow is combined with a genetic algorithm and an interactive interface for user-guided exploration, with a representative MPAS-Ocean cold-tongue case using similarity, diversity, and uncertainty weights of 0.8, 0.6, and −0.8, respectively (Shen et al., 2024). In operational planning for power systems, the surrogate for dynamic total transfer capability is differentiated analytically so that its Jacobian and Hessian can be inserted into an interior-point solver, replacing DAE-based transient-security constraints by algebraic inequalities (Qiu et al., 2020). This is a particularly strong example of surrogate operationalization: the neural approximation is not merely evaluated in outer-loop simulation but transformed into a differentiable optimization component.

A further development is the automation of surrogate construction itself. AutoSurrogate uses an LLM-driven multi-agent system to profile data, select among eight architectures, run Bayesian hyperparameter optimization, monitor training failure modes, and report artifacts for a 3D geological carbon storage task (Liu et al., 13 Apr 2026). Without manual tuning, the framework achieves $\hat{d}=f_\theta(m)$ 9 for pressure in its three-attempt configuration and $m$ 0 for saturation, outperforming both expert-fixed baselines and domain-agnostic AutoML under the reported budgets (Liu et al., 13 Apr 2026). A plausible implication is that surrogate modeling is becoming not only a substitute for expensive simulation but also an object of workflow automation.

7. Limitations, recurring challenges, and research directions

Several recurring limitations cut across application domains. Distribution shift remains central. Surrogates may degrade when controls, geology, geometry, or operating regimes depart from the training manifold. This is explicit in stochastic and Bayesian settings, where predictive variance increases for interpolated or out-of-distribution cases, but it is also visible in deterministic models through long-horizon drift, front smearing, or failure near sharp geometric features (Islam et al., 22 Jan 2025). Geometry-aware models mitigate some of this by removing fixed-grid assumptions, yet even discretization-independent hypernetworks show larger errors near polygon vertices or fine vehicle details, indicating that minimum-distance encoding alone can be insufficient (Duvall et al., 2021).

Long-term temporal stability is another difficulty. One-step latent surrogates such as E2C accumulate errors during rollout, motivating multi-step Koopman training (Chen et al., 2024). Recurrent field surrogates can also develop front-boundary errors or oversmoothing at later times, especially in transport-dominated regimes (Han et al., 2024). Online training on more diverse trajectories improves generalization for several PDE surrogate families, but the reported benefit is architecture-dependent: FNO gains were clear, whereas U-Net performance could remain unstable without additional rollout stabilization tricks (Meyer et al., 2023).

A further issue concerns the meaning of “physics-informed.” Some works use the term for PDE-residual minimization with hard boundary enforcement, while others use it for structural choices such as output masks, staged coupling, or regime classifiers (Sun et al., 2019). This suggests a conceptual ambiguity in the field. A plausible implication is that future scholarship will continue to separate physics-constrained surrogates, which encode governing equations directly in the loss or ansatz, from physics-aware surrogates, which encode only architectural or data-design inductive biases.

The literature also identifies unresolved trade-offs among expressiveness, invertibility, calibration, and computational tractability. Normalizing flows offer exact likelihoods and invertibility but require dimension-preserving invertible layers and often motivate autoencoder compression (Shen et al., 2024). Bayesian CNNs and dropout-based surrogates provide uncertainty estimates but depend on approximation quality and hyperparameter tuning (Islam et al., 22 Jan 2025). Hypernetworks and GNNs improve geometric generalization but can introduce training instabilities or oversmoothing (Duvall et al., 2021). Active learning and multifidelity strategies reduce data cost, but their effectiveness depends on the fidelity gap and the quality of the failure or transfer signal (Vardhan et al., 2022).

The overall direction of the field is therefore not a simple march toward larger models. The cited work points instead toward specialized surrogate constructions matched to output topology, geometry, uncertainty requirements, and downstream use: recurrent volumetric surrogates for 3D multiphase flow (Tang et al., 2020), conditional generative surrogates for parameter-space exploration (Shen et al., 2024), Bayesian linked surrogates for microstructure–property pipelines (Islam et al., 22 Jan 2025), continuous hypernetwork surrogates for variable geometries (Duvall et al., 2021), graph time-steppers for mesh-dependent PDEs (Franco et al., 2023), and automated multi-agent systems for end-to-end surrogate construction (Liu et al., 13 Apr 2026). This suggests that deep learning-based surrogate modeling is best understood not as a single algorithmic recipe, but as a technically heterogeneous discipline organized around a common computational objective: replacing repeated high-fidelity numerical solves with learned, structurally informed approximations that remain useful inside scientific workflows.