Gradient Surrogates in Optimization

Updated 1 January 2026

Gradient surrogates are differentiable models that approximate gradients of non-differentiable or costly functions to enable efficient optimization.
They are widely applied in engineering design, offline black-box tasks, and Bayesian optimization to improve computational efficiency and stability.
Advanced methods integrate multifidelity data, domain transformations, and smoothing techniques to enhance gradient matching and scalability.

Gradient surrogates are differentiable models or functions used to approximate the gradients of potentially non-differentiable, expensive, or inaccessible objective functions. They are deployed across optimization, machine learning, engineering design, offline black-box tasks, Bayesian optimization, and simulation-based workflows to enable gradient-based methods in otherwise inaccessible or costly settings. Gradient surrogates may encode explicit gradient information, employ smooth proxies for non-differentiable components, or embody probabilistic structures that allow efficient uncertainty quantification and scalable inference.

1. Mathematical Foundations and Formal Definitions

Gradient surrogates generalize the notion of surrogate modeling to the explicit representation or approximation of gradients. Let $f:\mathcal{X}\to\mathbb{R}$ be a black-box or non-differentiable function, and $g_\theta:\mathcal{X}\to\mathbb{R}$ a differentiable surrogate parameterized by $\theta$ . The gradient surrogate aims to ensure that $\nabla_x g_\theta(x)$ approximates $\nabla_x f(x)$ (for differentiable $f$ ) or serves as a proxy for the descent direction when $\nabla_x f(x)$ is unavailable.

Fundamental approaches for gradient surrogate construction include:

Regression-based surrogates: Fit $g_\theta$ to observed $f(x_i),\,\nabla_x f(x_i)$ at design points $x_i$ using kernel methods, radial basis functions, or neural networks (Bouhlel et al., 2017, Semler et al., 2024).
Probabilistic surrogates: Employ Gaussian processes with joint kernels for function and gradient observations or Bayesian neural networks incorporating gradient data in the likelihood (Makrygiorgos et al., 14 Apr 2025, Marchildon et al., 12 Apr 2025).
Functional surrogate frameworks: Construct surrogate objectives as majorizers or Bregman-proximity bounds (Léger et al., 2023, Vaswani et al., 2021, Lavington et al., 2023).

Gradient surrogates often provide guaranteed improvement steps within majorization-minimization schemes, convexity bounds within the surrogate domain, or theoretical control of optimization gap via gradient-matching error (Hoang et al., 26 Feb 2025).

2. Gradient Surrogates in Engineering and Scientific Optimization Pipelines

In computer-aided engineering (CAE), simulating physical processes typically involves non-differentiable steps (meshing, CFD, FEA codes). Gradient surrogates replace these with fully differentiable models, commonly deep neural networks, enabling the entire pipeline to be optimized via backpropagation and automatic differentiation (Rehmann et al., 13 Nov 2025).

Shape representation: Use signed distance fields parameterized by geometric quantities.
Surrogate architecture: Apply 3D U-Net models with designed encoders/decoders and attention mechanisms. Instead of directly differentiating through meshing and simulation, gradients are propagated through the surrogate network, which is trained to predict velocity fields (CFD outputs) from the SDF representation.
Optimization: End-to-end autodiff yields analytic derivatives with respect to design parameters; first-order optimizers (MMA, Adam, SGD) can be used subject to box constraints.
Performance: Achieve orders-of-magnitude reduction in computational cost and enable high-dimensional design search (Rehmann et al., 13 Nov 2025).

The surrogate approach is applicable wherever adjoint methods are unavailable or impractical, and enables practical, scalable gradient-based optimization in industrial contexts.

3. Gradient-Enhanced Bayesian and Probabilistic Surrogates

Bayesian optimization extensively deploys surrogates to efficiently locate minima of expensive functions. Incorporating gradient data into surrogate models (GPs or BNNs) yields improved sample efficiency, robustness to noise, and scalability with dimension (Makrygiorgos et al., 14 Apr 2025, Marchildon et al., 12 Apr 2025, Semler et al., 2024).

Surrogate Type	Data Used	Scalability	Key Methods
GP joint priors	function + gradients	$\mathcal{O}((D+1)n)^3$	Marginal likelihood maximization
BNNs with gradient-informed loss	function + gradients	$\mathcal{O}(n(D+1))$	SGHMC, automatic differentiation

Gradient-informed BNNs employ composite likelihoods penalizing both function-value and gradient mismatches, with hyperparameters to balance each term. Acquisition functions (LCB, Expected Improvement) remain unchanged but benefit from sharper predictive gradients and improved uncertainty (Makrygiorgos et al., 14 Apr 2025).

Gradient-enhanced Bayesian optimization achieves:

Deep local optimality with fewer function/gradient evaluations.
Robust convergence even with noisy/inexact gradients (e.g. chaotic Lorenz-63, inexact sensitivities).
Significant practical improvements over quasi-Newton optimizers in high-dimensional, noisy, or multimodal settings (Marchildon et al., 12 Apr 2025).

4. Offline Black-Box Optimization and Gradient Matching

Offline optimization, central to material/chemical design or controller synthesis, lacks the ability to query $f$ outside a fixed dataset. Surrogates trained only on pointwise values can mislead optimization due to poor gradient alignment out-of-distribution. Recent work formalizes the link between surrogate gradient error and optimization performance gap: minimizing the maximum deviation $\sup_x \| \nabla f(x) - \nabla g_\theta(x) \|$ mitigates the loss in objective value between the true and surrogate optima (Hoang et al., 26 Feb 2025).

Algorithms for gradient matching employ:

Line-integral losses enforcing consistency between observed function differences and integrated surrogate gradients along sampled paths in data space.
Regularization combining value and gradient matching to improve robustness.
Empirical results establish state-of-the-art optimization quality across a wide range of benchmarks, with gradient-based surrogate training outperforming alternatives, especially in high-dimensional or nonconvex settings.

5. Surrogate Gradient Methods for Non-Differentiable and Black-Box Functions

Optimization over non-differentiable functions or black-box metrics (classification, ranking, hard constraints, reinforcement learning) often resorts to surrogate gradients.

Forward Gradient Injection (FGI) (Otte, 2024): Directly injects a designed surrogate gradient shape into the autograd graph at the forward pass in deep learning frameworks, crucial for spiking neural networks and other models with Heaviside or quantized activations. Enables TorchScript and compiled models to function efficiently without custom backward definitions.
Gradient Surrogates in Reinforcement Learning (Vaswani et al., 2021): Functional mirror ascent constructs general surrogate objectives whose gradient step guarantees monotonic improvement, subsuming TRPO, PPO, and MPO as special cases. The surrogate can use any Bregman divergence, providing stability and policy improvement guarantees even for approximate, parameterized inner-loop maximization.

6. Advanced Strategies: Multifidelity, Domain Transformation, and Smoothing

Multifidelity Gradient-Only Surrogates (Wilke, 2024): Fuse gradient samples from low- and high-fidelity simulations in a unified weighted regression system. This approach avoids ill-conditioning and adapts to varying data sources, using only gradients to construct regression surfaces via RBF methods, with robust validation and error estimation.
Domain Transformation for Surrogate Construction (Bouwer et al., 2023): Gradient information enables efficient estimation of local curvature (Hessian) via Broyden updates, facilitating optimal scaling and rotation of input space for surrogate modeling. Surrogates benefit from near-isotropic transformed domains, yielding significant accuracy improvements in high dimensions.
Laplacian Smoothing Surrogates (Osher et al., 2018): Apply a linear operator to the raw gradient (e.g. invert discrete Laplacian), yielding mean-preserving, variance-reducing surrogate gradients that allow larger step sizes and improved generalization in SGD, deep nets, and reinforcement learning.

7. Surrogates in Evolutionary, Stochastic, and Hybrid Optimization

In evolutionary strategies, gradient surrogates are employed to combine past descent directions and fresh random perturbations into analytically optimal gradient estimators, improving convergence even when direct gradients are unavailable (Meier et al., 2019). In stochastic optimization, surrogate construction in a target/logit space amortizes expensive gradient calculations and enables black-box parameter updates with provable stationary point convergence (Lavington et al., 2023).

References

Key works directly informing this article include:

Surrogate-based differentiable CAE pipelines (Rehmann et al., 13 Nov 2025)
Gradient-informed Bayesian neural networks for scalable BO (Makrygiorgos et al., 14 Apr 2025)
Efficient gradient-enhanced Bayesian optimizers (Marchildon et al., 12 Apr 2025)
Adaptive gradient-enhanced Gaussian process surrogates (Semler et al., 2024)
Gradient matching for offline optimization (Hoang et al., 26 Feb 2025)
Gradient-only multifidelity surrogates (Wilke, 2024)
Forward gradient injection framework (Otte, 2024)
General surrogate descent schemes (Léger et al., 2023)
Functional mirror ascent policy-gradient surrogates (Vaswani et al., 2021)
Optimal surrogate gradient ES estimators (Meier et al., 2019)
Laplacian smoothing gradient surrogates (Osher et al., 2018)

As surrogate gradient methods continue to evolve, their deployment is central in broadening the applicability and efficiency of gradient-based optimization, particularly in domains historically limited by non-differentiable, costly, or inaccessible objective landscapes.