Taylor Approximation Control Variates
- Taylor approximation-based control variates are variance reduction techniques that use local polynomial surrogates to efficiently reduce Monte Carlo estimator variance without introducing bias.
- They compute closed-form moments via first- and second-order Taylor expansions, making them effective in high-dimensional PDE simulations and Bayesian inference.
- Their application spans PDE-constrained optimization, deep learning, statistical physics, and inverse problems, where they significantly cut computational costs and enhance convergence.
A Taylor approximation–based control variate is a variance reduction technique that exploits the local polynomial (usually first- or second-order) approximation of a target function with respect to uncertain or random variables. By constructing a surrogate whose moments can be computed analytically or efficiently, Taylor-based control variates suppress the variance of Monte Carlo estimators without introducing bias. This framework has extensive applications in PDE-constrained optimal control under uncertainty, deep learning, Bayesian inference, statistical physics, and stochastic differential equations, particularly in high-dimensional and computationally intensive regimes.
1. Mathematical Foundations of Taylor Approximation-Based Control Variates
The key idea is to expand a function of interest , where is a random parameter (often assumed to follow a multivariate normal distribution), around a reference point using a Taylor polynomial. The second-order expansion is given by: This approximation can be used to construct a control variate for Monte Carlo estimators. Assuming , the moments of are available in closed form:
An unbiased estimator for the expectation is thus
with an optimizable linear coefficient for further variance reduction (Chen et al., 2018).
2. Analytical Structure and Optimal Control in PDE Settings
In PDE-constrained optimal control under uncertainty, typically represents a quantity of interest evaluated at the PDE state. The Taylor-based control variate exploits the analytical tractability of the surrogate's moments, requiring only traces of the (preconditioned) Hessian, for which efficient randomized generalized eigenvalue solvers are employed.
This enables the computation of and with cost scaling in the effective, rather than nominal, dimension of . The resulting estimator greatly accelerates mean-variance optimization for high-dimensional control problems, as evidenced in large-scale subsurface flow and turbulent jet-control studies, where several orders-of-magnitude variance reduction were observed compared to naive Monte Carlo approaches (Chen et al., 2018).
3. Extensions to Statistical Learning, Bayesian Inference, and Score-Based Models
Taylor control variates generalize to reparameterization gradients in variational inference, where the model log-density is expanded around the mean of the variational distribution. A quadratic surrogate \,, fitted via variance minimization in a double-descent scheme, yields a zero-mean control variate for stochastic optimization. This allows for substantial reductions in gradient variance and accelerated ELBO convergence under flexible reparameterizable variational families (Geffner et al., 2020).
For denoising score matching in diffusion models, a -th order Taylor expansion of the score network is inserted into the loss or its gradient. Notably, the gradient-level and loss-level control variates are shown to be equivalent under automatic differentiation. The per-sample control variate , constructed by explicit Taylor expansion and subtraction of its expectation, achieves provably unbiased variance suppression in gradient-based training—albeit with diminishing returns in highly irregular (e.g., deep U-Net) parameterizations (Jeha et al., 2024).
4. Variance Reduction for Approximation Error Estimation and Surrogate Modeling
In Bayesian inverse problems governed by PDEs, Taylor-based control variates are used to efficiently estimate the mean and covariance of approximation errors between accurate and surrogate forward models. Given the analytic mean and covariance of the linear/quadratic Taylor surrogate, the control variate estimator for the error is
where . The implementation leverages sensitivity analysis (first and second derivatives computed via PDE solves) and keeps linearized-computational cost scaling with data-space, not parameter-space, dimension. This leads to order-of-magnitude savings in high-fidelity PDE simulations required for accurate BAE (Bayesian approximation error) quantification (Nicholson et al., 5 Dec 2025).
5. Taylor Control Variates in Stochastic and Statistical Physics
The perturbative control variate strategy, as applied in nonequilibrium statistical physics, leverages Taylor (asymptotic) expansion in the generator of a diffusion process. The optimal zero-variance control variate formally solves the Poisson equation , but intractability motivates expansion around a tractable , resulting in , with each term computed recursively. Modified observables of the form enjoy asymptotic variance , yielding a 10–100 variance reduction in practical high-dimensional particle, chain, and solvation models (Roussel et al., 2017).
6. Deep Learning for High-Dimensional Semilinear PDEs and BSDEs
Taylor control variates are embedded into deep learning–based solvers for high-dimensional backward stochastic differential equations (BSDEs) and associated semilinear PDEs. Here, the "dominant" linear PDE/BSDE component is obtained via asymptotic (Taylor) expansion, and the neural network directly fits only the "small," nonlinear residual part. This results in losses and gradients orders-of-magnitude smaller than those arising in the naive (uncontrolled) problem, enabling rapid convergence and effective scalability even in dimensional financial option pricing or FBSDE benchmarks (Takahashi et al., 2021).
7. Quantitative Summary and Theoretical Guarantees
Empirically and theoretically, variance reductions achieved by Taylor-based control variates depend on the correlation between the target and surrogate. For quadratic surrogates and well-chosen expansion points:
- Variance reductions by factors of – for mean estimates and $10$– for variances are typical in PDE-uncertainty quantification and Bayesian inverse settings (Chen et al., 2018, Nicholson et al., 5 Dec 2025).
- Offloading much of the computational burden to the surrogate evaluation allows for reductions in required high-fidelity model solves by an order of magnitude or more (Nicholson et al., 5 Dec 2025, Takahashi et al., 2021).
- In variational inference and diffusion models, correlation coefficients for Taylor-derived control variates often reach $0.9$–$0.99$ in favorable regimes, resulting in substantial gradient variance reduction and robust convergence (Geffner et al., 2020, Jeha et al., 2024).
Nevertheless, the effectiveness of Taylor-based control variates degrades when the target function is highly nonlinear with respect to the expansion variable, or when higher-order Taylor coefficients are impractically costly to compute.
Table: Representative Application Domains and Implementation Notes
| Application Domain | Taylor Surrogate Order | Typical Computational Bottleneck |
|---|---|---|
| PDE-constrained optimal control (Chen et al., 2018) | 1st & 2nd | Trace estimation (randomized eigen) |
| Variational inference (Geffner et al., 2020) | 2nd (quadratic) | Hessian–vector products/autodiff |
| Diffusion model training (Jeha et al., 2024) | 1st–2nd | High-order derivatives in networks |
| BAE in Bayesian inverse problems (Nicholson et al., 5 Dec 2025) | 1st & 2nd | Sensitivity PDE solves |
| Stochastic MD/statistical physics (Roussel et al., 2017) | 1st in perturbation | Poisson equation solves |
| Deep PDE solvers (BSDEs) (Takahashi et al., 2021) | 1st in | Linear PDE & path simulation |
Taylor approximation–based control variates constitute a unifying framework for bias-free, scalable variance reduction across domains where evaluation of expensive forward models, gradients, or stochastic processes is required, and where the underlying model is sufficiently smooth for local polynomial approximation. Their continued relevance depends on advances in high-order differentiation, surrogate modeling, and randomized linear algebra for scalable implementation in increasingly high-dimensional and non-linear settings.