Taylor Approximation Control Variates

Updated 25 March 2026

Taylor approximation-based control variates are variance reduction techniques that use local polynomial surrogates to efficiently reduce Monte Carlo estimator variance without introducing bias.
They compute closed-form moments via first- and second-order Taylor expansions, making them effective in high-dimensional PDE simulations and Bayesian inference.
Their application spans PDE-constrained optimization, deep learning, statistical physics, and inverse problems, where they significantly cut computational costs and enhance convergence.

A Taylor approximation–based control variate is a variance reduction technique that exploits the local polynomial (usually first- or second-order) approximation of a target function with respect to uncertain or random variables. By constructing a surrogate whose moments can be computed analytically or efficiently, Taylor-based control variates suppress the variance of Monte Carlo estimators without introducing bias. This framework has extensive applications in PDE-constrained optimal control under uncertainty, deep learning, Bayesian inference, statistical physics, and stochastic differential equations, particularly in high-dimensional and computationally intensive regimes.

1. Mathematical Foundations of Taylor Approximation-Based Control Variates

The key idea is to expand a function of interest $f(\xi)$ , where $\xi$ is a random parameter (often assumed to follow a multivariate normal distribution), around a reference point $\xi_0$ using a Taylor polynomial. The second-order expansion is given by: $f(\xi) \approx T(\xi) = f(\xi_0) + \nabla f(\xi_0)^T(\xi - \xi_0) + \tfrac12 (\xi - \xi_0)^T \nabla^2 f(\xi_0) (\xi - \xi_0)$ This approximation can be used to construct a control variate for Monte Carlo estimators. Assuming $\xi \sim N(\xi_0, C)$ , the moments of $T(\xi)$ are available in closed form: $\mathbb{E}[T(\xi)] = f(\xi_0) + \tfrac12\, \mathrm{tr}(C \nabla^2 f(\xi_0))$

$\mathrm{Var}[T(\xi)] = \nabla f(\xi_0)^T C \nabla f(\xi_0) + \tfrac12\, \mathrm{tr}((C \nabla^2 f(\xi_0))^2)$

An unbiased estimator for the expectation $\mathbb{E}[f(\xi)]$ is thus

$\hat{\mu}_{\mathrm{CV}} = \frac{1}{M} \sum_{i=1}^M [f(\xi_i) - T(\xi_i)] + \mathbb{E}[T(\xi)]$

with an optimizable linear coefficient $\beta$ for further variance reduction (Chen et al., 2018).

2. Analytical Structure and Optimal Control in PDE Settings

In PDE-constrained optimal control under uncertainty, $f$ typically represents a quantity of interest evaluated at the PDE state. The Taylor-based control variate exploits the analytical tractability of the surrogate's moments, requiring only traces of the (preconditioned) Hessian, for which efficient randomized generalized eigenvalue solvers are employed.

This enables the computation of $\mathrm{tr}(C \nabla^2 f)$ and $\mathrm{tr}((C \nabla^2 f)^2)$ with cost scaling in the effective, rather than nominal, dimension of $\xi$ . The resulting estimator greatly accelerates mean-variance optimization for high-dimensional control problems, as evidenced in large-scale subsurface flow and turbulent jet-control studies, where several orders-of-magnitude variance reduction were observed compared to naive Monte Carlo approaches (Chen et al., 2018).

3. Extensions to Statistical Learning, Bayesian Inference, and Score-Based Models

Taylor control variates generalize to reparameterization gradients in variational inference, where the model log-density is expanded around the mean of the variational distribution. A quadratic surrogate $\,\hat{f}_v(z) = b_v^T(z-z_0) + \frac12 (z-z_0)^T B_v (z-z_0)$ \,, fitted via variance minimization in a double-descent scheme, yields a zero-mean control variate for stochastic optimization. This allows for substantial reductions in gradient variance and accelerated ELBO convergence under flexible reparameterizable variational families (Geffner et al., 2020).

For denoising score matching in diffusion models, a $k$ -th order Taylor expansion of the score network $s_\theta(\cdot)$ is inserted into the loss or its gradient. Notably, the gradient-level and loss-level control variates are shown to be equivalent under automatic differentiation. The per-sample control variate $C^k_\theta(z, x, \sigma)$ , constructed by explicit Taylor expansion and subtraction of its expectation, achieves provably unbiased variance suppression in gradient-based training—albeit with diminishing returns in highly irregular (e.g., deep U-Net) parameterizations (Jeha et al., 2024).

4. Variance Reduction for Approximation Error Estimation and Surrogate Modeling

In Bayesian inverse problems governed by PDEs, Taylor-based control variates are used to efficiently estimate the mean and covariance of approximation errors between accurate and surrogate forward models. Given the analytic mean and covariance of the linear/quadratic Taylor surrogate, the control variate estimator for the error $\Delta(\theta)=f(\theta)-t(\theta)$ is

$C(\theta) = \Delta(\theta) - [\Delta_{\mathrm{TA}}(\theta) - m_{\mathrm{TA}}]$

where $m_{\mathrm{TA}} = \mathbb{E}[\Delta_{\mathrm{TA}}]$ . The implementation leverages sensitivity analysis (first and second derivatives computed via PDE solves) and keeps linearized-computational cost scaling with data-space, not parameter-space, dimension. This leads to order-of-magnitude savings in high-fidelity PDE simulations required for accurate BAE (Bayesian approximation error) quantification (Nicholson et al., 5 Dec 2025).

5. Taylor Control Variates in Stochastic and Statistical Physics

The perturbative control variate strategy, as applied in nonequilibrium statistical physics, leverages Taylor (asymptotic) expansion in the generator $L$ of a diffusion process. The optimal zero-variance control variate formally solves the Poisson equation $-L\phi=f-\mathbb{E}[f]$ , but intractability motivates expansion around a tractable $L_0$ , resulting in $\phi = \phi_0 + \epsilon \phi_1 + \mathcal{O}(\epsilon^2)$ , with each term computed recursively. Modified observables of the form $\hat{f}(X)=f(X)+L(\phi_0+\epsilon\phi_1)(X)$ enjoy asymptotic variance $O(\epsilon^2)$ , yielding a 10–100 $\times$ variance reduction in practical high-dimensional particle, chain, and solvation models (Roussel et al., 2017).

6. Deep Learning for High-Dimensional Semilinear PDEs and BSDEs

Taylor control variates are embedded into deep learning–based solvers for high-dimensional backward stochastic differential equations (BSDEs) and associated semilinear PDEs. Here, the "dominant" linear PDE/BSDE component is obtained via asymptotic (Taylor) expansion, and the neural network directly fits only the "small," nonlinear residual part. This results in losses and gradients orders-of-magnitude smaller than those arising in the naive (uncontrolled) problem, enabling rapid convergence and effective scalability even in $d=100$ dimensional financial option pricing or FBSDE benchmarks (Takahashi et al., 2021).

7. Quantitative Summary and Theoretical Guarantees

Empirically and theoretically, variance reductions achieved by Taylor-based control variates depend on the correlation between the target and surrogate. For quadratic surrogates and well-chosen expansion points:

Variance reductions by factors of $10^2$ – $10^3$ for mean estimates and $10$– $10^2$ for variances are typical in PDE-uncertainty quantification and Bayesian inverse settings (Chen et al., 2018, Nicholson et al., 5 Dec 2025).
Offloading much of the computational burden to the surrogate evaluation allows for reductions in required high-fidelity model solves by an order of magnitude or more (Nicholson et al., 5 Dec 2025, Takahashi et al., 2021).
In variational inference and diffusion models, correlation coefficients for Taylor-derived control variates often reach $0.9$–$0.99$ in favorable regimes, resulting in substantial gradient variance reduction and robust convergence (Geffner et al., 2020, Jeha et al., 2024).

Nevertheless, the effectiveness of Taylor-based control variates degrades when the target function is highly nonlinear with respect to the expansion variable, or when higher-order Taylor coefficients are impractically costly to compute.

Table: Representative Application Domains and Implementation Notes

Application Domain	Taylor Surrogate Order	Typical Computational Bottleneck
PDE-constrained optimal control (Chen et al., 2018)	1st & 2nd	Trace estimation (randomized eigen)
Variational inference (Geffner et al., 2020)	2nd (quadratic)	Hessian–vector products/autodiff
Diffusion model training (Jeha et al., 2024)	1st–2nd	High-order derivatives in networks
BAE in Bayesian inverse problems (Nicholson et al., 5 Dec 2025)	1st & 2nd	Sensitivity PDE solves
Stochastic MD/statistical physics (Roussel et al., 2017)	1st in perturbation	Poisson equation solves
Deep PDE solvers (BSDEs) (Takahashi et al., 2021)	1st in $\epsilon$	Linear PDE & path simulation

Taylor approximation–based control variates constitute a unifying framework for bias-free, scalable variance reduction across domains where evaluation of expensive forward models, gradients, or stochastic processes is required, and where the underlying model is sufficiently smooth for local polynomial approximation. Their continued relevance depends on advances in high-order differentiation, surrogate modeling, and randomized linear algebra for scalable implementation in increasingly high-dimensional and non-linear settings.