Learned Surrogates: Modeling & Optimization

Updated 13 April 2026

Learned surrogates are parameterized models that approximate complex, expensive functions using neural networks or similar architectures.
They enable scalable, gradient-based optimization by replacing high-fidelity simulations with efficient, differentiable models.
Practical implementations include Fourier neural operators and deep kernel methods, providing uncertainty quantification and parallel computational advantages.

A learned surrogate is a parameterized model, often implemented as a neural network or other machine learning construct, that approximates a complex, expensive, or non-differentiable target operator, function, or metric. It is trained with the specific aim of providing a computationally cheap, differentiable, and optionally uncertainty-aware alternative that enables scalable optimization, inference, or control within otherwise intractable domains. Learned surrogates arise in scientific machine learning, engineering design, inverse problems, black-box optimization, and modern differentiable programming, where they systematically replace high-fidelity models, non-smooth algorithms, or black-box losses with learned, trainable mappings that preserve essential structure while dramatically accelerating downstream tasks.

1. Mathematical Formulation and Core Principles

A learned surrogate seeks to approximate an operator $\mathcal{G}$ mapping from an input space $\mathcal{A}$ (e.g., parametric fields, design variables, hyperparameter configurations) to an output space $\mathcal{U}$ (e.g., PDE solutions, loss values, device responses):

$\mathcal{G}\colon \mathcal{A} \to \mathcal{U}$

The surrogate $\mathcal{G}_\theta$ is a parameterized mapping (typically with parameters $\theta$ ):

$\mathcal{G}_\theta(a) \approx \mathcal{G}(a)$

Training aligns $\mathcal{G}_\theta$ with $\mathcal{G}$ using data $\{(a^{(i)}, u^{(i)})\}$ , where $\mathcal{A}$ 0 are outputs from the expensive simulator or process. The surrogate supports direct gradient computation with respect to $\mathcal{A}$ 1 or $\mathcal{A}$ 2, and can encompass additional properties like uncertainty estimates, robustness, and multi-fidelity modeling.

Surrogates may target solution operators for parametric PDEs (II et al., 2022), response surfaces in Bayesian optimization (Wistuba et al., 2021), non-differentiable or non-decomposable metrics (edit distance, F1, etc.) (Patel et al., 2020, Grabocka et al., 2019), or even structural elements of discrete optimization algorithms (e.g., cutting-plane master steps (Mana et al., 2023)).

Key characteristics include:

Parametric, often high-capacity models (deep neural networks, kernel GPs, GNNs, KANs) (II et al., 2022, Wistuba et al., 2021, Patel et al., 2020, Ma et al., 23 Mar 2025, Natterer et al., 19 Jan 2025).
Domain-agnostic ability for discretization-agnostic surrogates (e.g., neural operators).
Differentiability, enabling gradient-based optimization or inference regardless of original operator’s properties.
Uncertainty Quantification, as in probabilistic or heteroskedastic surrogates (Bahl et al., 2 Jan 2026).
Low computational cost at inference compared to original high-fidelity models or simulations.

2. Surrogate Architectures and Learning Paradigms

Various surrogate architectures are deployed based on the target problem:

Fourier Neural Operators (FNOs): Iterative architectures lifting input functions into a latent space, followed by a sequence of spectral convolution and pointwise channel-mixing operations, enabling approximation of infinite-dimensional solution maps for parametrized PDEs (II et al., 2022).
Deep Kernel Gaussian Process Surrogates: Feature maps parameterized by deep neural networks define kernels for Gaussian process regression, facilitating both uncertainty quantification and non-linear representation (Wistuba et al., 2021).
Deep Embedding Surrogates: Embedding networks learn to map prediction-target pairs into a metric-aligned vector space, so that the surrogate loss (e.g., Euclidean distance) approximates non-differentiable metrics like edit distance or IoU (Patel et al., 2020).
Compact Low-Dimensional Surrogates: Linear transformations or meta-variables reduce the dimensionality of optimization layers, optimizing only along the most decision-relevant axes to enable tractable, smooth, and fast end-to-end training (Wang et al., 2020).
Kolmogorov–Arnold Networks (KANs): Nested, univariate-compositional architectures structured to preserve relative-order of objective values in black-box optimization (Ma et al., 23 Mar 2025).
Smooth Neural Surrogates (SNS): MLPs with explicit layer-wise Lipschitz constraints and heavy-tailed (e.g., Cauchy) likelihoods, improving smoothness and gradient quality for trajectory optimization (Moore et al., 17 Jan 2026).
Graph Neural Networks (GNNs): Used as surrogates for agent-based simulators or spatio-temporal systems, capturing graph-structured interactions and policy-induced variations (Natterer et al., 19 Jan 2025).

Learning paradigms vary:

Supervised regression: mapping parametric inputs to targets (solutions, losses, responses) using L2 or similar losses.
Meta-learning: training “shared” surrogates that adapt or generalize across related tasks (Wistuba et al., 2021, Picard-Weibel et al., 2024).
Bilevel and end-to-end optimization: jointly optimizing surrogates and predictors to reflect decisions or losses of interest (Grabocka et al., 2019, Wang et al., 2020).
Offline learning with consistent penalization: simultaneously fitting surrogate and action/control variables in one-shot (Guth et al., 2021).
Gradient-matching: minimizing discrepancies not only in predicted values but (critically) in gradients, which directly control optimization quality under surrogate-induced search (Hoang et al., 26 Feb 2025).
Reinforcement learning surrogates: replacing NP-hard or combinatorial algorithmic steps with neural policies (Mana et al., 2023).
Probabilistic surrogates with uncertainty heads: training surrogates to output both mean predictions and systematic uncertainties, calibrated across the domain (Bahl et al., 2 Jan 2026).

3. Training, Validation, and Loss Functions

Standard paradigm involves collecting or simulating a (possibly large) dataset of input-output pairs, then regressing surrogate outputs to the ground truth using mean squared error, negative log-likelihood, or task-specific metrics. For complex or non-differentiable losses, surrogates are trained on “local–global” mixtures of prediction-target pairs to ensure both metric alignment and effective downstream gradients (Patel et al., 2020).

For gradient-sensitive applications, losses include both value and gradient-matching terms: $\mathcal{A}$ 3 where $\mathcal{A}$ 4 traces line segments between training points (Hoang et al., 26 Feb 2025).

For uncertainty-aware surrogates, negative log-likelihoods under heteroskedastic, non-Gaussian, or mixture models are minimized; calibration is validated via pull tests, coverage probabilities, and adaptive resampling strategies (Bahl et al., 2 Jan 2026).

Surrogates for algorithmic or reinforcement learning settings employ hybrid objectives: standard RL policy/value losses augmented by supervised or order-preserving terms when used as drop-in substitutes for fitness or policy evaluation (Ma et al., 23 Mar 2025, Mana et al., 2023).

4. Computational Strategies and Parallelism

Learned surrogates enable dramatic scaling by shifting expensive computations into the offline training phase, and leveraging highly parallelizable inference at evaluation or optimization time.

Model-parallel domain decomposition: FNO surrogates use tensor-slicing across spatial and/or temporal dimensions, broadcasting weights and distributing FFT computations. Data, weights, and gradients are distributed over multiple GPUs, enabling solution of PDEs at the scale of billions of degrees of freedom (II et al., 2022).
Batch training and data handling: Surrogates are trained with stochastic gradient descent on mini-batches, with careful normalization, and (where necessary) sampling to ensure adequate coverage of input space or distribution features.
Active sampling and local refinement: Out-of-distribution or low-accuracy regions are identified and reseeded with additional simulation data, leveraging error-based kernel density estimation to concentrate sampling effort (Bahl et al., 2 Jan 2026).
Hybrid exact-surrogate algorithms: For discrete optimization (e.g., cutting planes), surrogates partially replace (NP-hard) master steps under probabilistic schedules, while periodic exact computation “certifies” convergence and maintains theoretical guarantees (Mana et al., 2023).
Meta-learning and multi-task pooling: Surrogate parameterizations optimized across task families can support rapid adaptation and transfer to new domains (Wistuba et al., 2021, Picard-Weibel et al., 2024).

5. Empirical Performance, Advantages, and Limitations

Learned surrogates deliver:

Performance Domain	Acceleration/Quality	References
Large-scale PDE solution (CO₂ dynamics, 2.6B DOF)	270–1,400 $\mathcal{A}$ 5 speedups vs classical solvers, 7% L² error	(II et al., 2022)
Bayesian optimization (few-shot HPO)	State-of-the-art regret with meta-learned deep kernel GP	(Wistuba et al., 2021)
Image compression via quantization surrogates	1–3% BD-rate gain, stable training, robust to gradient bias	(Zhang et al., 2023)
Scientific computation (amplitude calculation)	NN surrogate with calibrated uncertainty, adaptive refinement	(Bahl et al., 2 Jan 2026)
Non-differentiable metric optimization (edit, F1, IoU)	Up to 39% reduction in error, 4–5% improvement in F1	(Patel et al., 2020, Grabocka et al., 2019)
GNN surrogates for ABM traffic simulation	R²=0.68 on policy roads, per-scenario eval $\mathcal{A}$ 60.1s	(Natterer et al., 19 Jan 2025)
Black-box meta-optimization	KAN surrogates with order-aware loss, robust across BBOB	(Ma et al., 23 Mar 2025)

Key advantages:

Orders-of-magnitude inference acceleration enables simulation, design, or inference workflows previously impractical due to computational cost (II et al., 2022, Natterer et al., 19 Jan 2025).
Differentiability of surrogates supports gradient-based optimization, inverse problems, and learning-to-optimize frameworks even for originally non-smooth or non-differentiable systems (Patel et al., 2020, Grabocka et al., 2019).
Uncertainty quantification and robust calibration enable high-stakes simulation and adaptive refinement (e.g., for loop amplitudes or safety constraints) (Bahl et al., 2 Jan 2026, Moss et al., 2024).
Scalability and parallelism, especially for high-dimensional or multiphysics systems (II et al., 2022).
Meta-learning and rapid adaptation across tasks (Wistuba et al., 2021, Picard-Weibel et al., 2024).

Limitations:

Extrapolation risk: Surrogates are only reliable within the data regime covered by training; OOD behavior must be mitigated via active sampling or constraints (Hoang et al., 26 Feb 2025, Yin et al., 2023).
Loss of optimality: If the surrogate cannot represent the true optimum (e.g., due to rank-deficiency, insufficient dimensionality, or limited architecture), decision quality can degrade (Wang et al., 2020, Hoang et al., 26 Feb 2025).
Complexity in high-dimensional settings: Balancing fidelity, dimensionality reduction, and tractable surrogate representation is challenging (Brunel et al., 2024).
Communication overhead in extreme model-parallel settings: Requires careful partitioning and implementation (II et al., 2022).

6. Surrogates in Practice: Case Studies and Applications

PDE and multiphysics surrogates: Model-parallel FNOs have enabled inverse CO₂ storage monitoring and high-dimensional UQ by solving PDEs on domains previously inaccessible to direct simulation. Learned surrogates (with normalizing flows as constraints) further enable robust inverse solvers under strict distributional control (II et al., 2022, Yin et al., 2023).

Offline black-box optimization: In domains where only historical evaluations are available, gradient-matching surrogates tightly control optimization gap relative to the true optimum. Theoretical guarantees on risk and practical strategies for bounding step sizes underpin reliable search (Hoang et al., 26 Feb 2025).

Algorithmic surrogates: RL-driven surrogates have successfully replaced NP-hard master steps in cutting-plane methods, with convergence and optimality guarantees intact, resulting in up to 45% wall-time improvements (Mana et al., 2023).

Image signal processing: Train-test mismatches under quantization are systematically addressed by surrogate annealing methods, with precise control over tradeoffs and stability (Zhang et al., 2023).

Scientific inference and control: In model predictive control for legged robotics, smooth neural surrogates yield well-conditioned dynamics models with bounded derivatives, enabling single-shooting solvers to robustly execute zero-shot behaviors (Moore et al., 17 Jan 2026).

Black-box meta-optimization: KAN surrogates trained with order-aware losses support effective inner-loop replacement for meta-policies, matching or exceeding RL meta-optimizers with far fewer true function evaluations (Ma et al., 23 Mar 2025).

High-dimensional surrogate fusion: Multi-fidelity surrogates with functional outputs are systematically categorized by dimensionality reduction and intermediate/fusion approaches; no approach dominates in all regimes, but appropriate choice yields significant benefit (Brunel et al., 2024).

7. Best Practices and Future Directions

Best practices crystallized across domains:

Domain decomposition and parallel libraries (e.g., DistDL) for large surrogate architectures (II et al., 2022);
Partitioning and communication strategies optimized for minimizing all-to-all transfers in FFT-heavy surrogates;
Joint optimization of surrogates and predictors in end-to-end or bilevel setups for nontrivial metrics or losses;
Gradient-matching (not just value regression) for robust optimization and theoretical risk control (Hoang et al., 26 Feb 2025);
Order-preserving losses in surrogate learning for black-box optimization to preserve global search trajectories (Ma et al., 23 Mar 2025);
Combined surrogate + learned constraint frameworks in inverse problems to prevent out-of-distribution failures (Yin et al., 2023);
Adaptive sampling to correct local surrogate error and improve coverage across complex input manifolds (Bahl et al., 2 Jan 2026).

Future directions include:

Surrogate architectures for extreme dimensionalities and multimodal outputs;
Integrated uncertainty quantification, especially for safety-critical or reliability-intensive workflows;
Hybrid multi-fidelity and hierarchical surrogate frameworks capable of exploiting structure across scales (Brunel et al., 2024);
Meta-learning extensions for fast adaptation and continual learning of semi-parametric surrogates (Wistuba et al., 2021, Picard-Weibel et al., 2024).

The cumulative evidence demonstrates that learned surrogates are now central to high-performance scientific computing, simulation- and decision-making workflows, and differentiable programming for systems that were previously inaccessible to machine-learning-based acceleration or optimization. Their design, analysis, and deployment require careful attention to architecture, learning protocol, domain coverage, and application-specific guarantees.