Bismut–Elworthy–Li Formula: A Stochastic Gradient Tool

Updated 6 March 2026

The BEL formula is a fundamental result in stochastic analysis that represents spatial derivatives as stochastic integrals via Malliavin calculus.
It extends classical SDE frameworks to include jumps, fractional noise, mean-field dynamics, and manifold settings, broadening its practical applications.
By converting derivative estimation into probabilistic integration by parts, the formula supports efficient numerical methods and rigorous theoretical insights.

The Bismut–Elworthy–Li (BEL) formula is a fundamental result in stochastic analysis, providing a non-anticipative stochastic representation for derivatives (gradients) of expectations of functionals of solutions to stochastic differential equations (SDEs) and their generalizations. It converts spatial derivatives of semigroups (or solutions to Kolmogorov-type equations) into stochastic integrals depending only on the solution and its perturbations, enabling practical gradient estimation and facilitating theoretical analysis, especially regularity and sensitivity results. The formula admits a wide range of extensions—including degenerate diffusions, jumps, mean-field (McKean–Vlasov) dependencies, fractional and rough noise, path-dependence, distributional flows, and geometric contexts.

1. Classical Formulation and Analytical Framework

The classical BEL formula applies to non-degenerate Itô SDEs on $\mathbb{R}^d$ of the form

$d X_t^x = b(X_t^x)\,dt + \sigma(X_t^x)\,dW_t,\qquad X_0^x = x,$

where $b,\sigma$ are $C^1_b$ and $\sigma$ is invertible. Let $P_t f(x) = \mathbb{E}[f(X_t^x)]$ , with $f$ bounded and measurable. For any deterministic matrix-valued weight $a:[0,t]\to \mathbb{R}^{d\times d}$ , $\int_0^t a(s)\,ds = I_d$ , the gradient admits the representation

$\nabla_x P_t f(x) = \mathbb{E}\biggl[ f(X_t^x) \int_{0}^{t} a(s) \left( D_s X_t^x \right)^\top dW_s \biggr],$

where $D_s X_t^x$ is the Malliavin derivative. For constant diffusion ( $\sigma=I_d$ ), $a(s)=\frac{1}{t}I_d$ is standard. The formula leverages the Malliavin calculus duality (integration by parts) and provides a probabilistic gradient representation devoid of $\nabla f$ (Baños et al., 2015, Baños, 2015).

2. Derivation via Malliavin Calculus and Key Terms

The derivation is predicated on Malliavin differentiability of the solution:

The Malliavin derivative $D_s X_t^x \in \mathbb{R}^{d\times d}$ satisfies

$D_s X_t^x = I_d + \int_s^t \nabla b(u, X_u^x) D_s X_u^x\,du, \quad D_t X_t^x = I_d.$

Uniform integrability and continuity properties of $D_s X_t^x$ are validated via compactness criteria (e.g., Da Prato–Malliavin–Nualart) (Baños et al., 2015).
The stochastic integral in the BEL formula is an Itô integral; explicit inversion of Malliavin covariance is avoided due to uniform non-degeneracy.
The formula is robust to singular drift under suitable integrability ( $b \in L^q_t L^p_x$ , $d/p + 2/q < 1$ ), bypassing Yamada–Watanabe pathwise uniqueness (Baños et al., 2015).

3. Geometric and Manifold Extensions

On manifolds, BEL-type formulas express the gradient of the heat semigroup, crucial for regularity and coupling analyses: $\nabla P_t f(x) = \mathbb{E}^x\left[ f(X_t^x) \frac{1}{t} \int_0^t Q_r^* dW_r \right]$ where $X_t^x$ is Brownian motion on $(M,g)$ , parallel-transported and damped by a pathwise solution $Q_t$ of

$dQ_t = -Q_t \left( //_{t}^{-1} \mathrm{Ric}_{X_t^x} //_{t} \right) dt, \qquad Q_0 = \mathrm{Id}_{T_xM}$

with $//$ denoting stochastic parallel translation (Braun et al., 2020). Under a Kato-integrable lower Ricci bound, this yields global Lipschitz-regularization of the heat flow: $\|\nabla P_t f\|_\infty \le \sqrt{2/t} \sup_{x\in M} \left\{\mathbb{E}^x e^{\frac{1}{2}\int_0^t k^-(X_r)\,dr}\right\}^{1/2} \|f\|_\infty$

4. Extensions to Jumps, Fractional, and Rough Noise

SDEs with Jumps:

For SDEs perturbed by Lévy kernels: $dx_t = a_0(x_{t^-})dt + \sum_{i=1}^m a_i(x_{t^-})dW_t^i + \int_{\mathbb{R}^m} b_z(x_{t^-}) \tilde{\mu}(dt, dz)$ the BEL formula decomposes the derivative into Brownian and Poisson (jump) components: $\nabla_x P_T f(x)= \mathbb{E}\left[\,f(x_T)\left(\int_0^T H_s\,dW_s + \int_0^T\!\!\int_{\mathbb{R}^m} G_s(z)\,\tilde{\mu}(ds, dz)\right)\right]$ where $H_s$ and $G_s(z)$ depend on the linearization and jump measure structure (Takeuchi, 2010).

Fractional Brownian Motion and Rough Volatility:

For SDEs driven by $B^H$ ( $H<1/2$ ) and singular drift, Malliavin calculus on Gaussian spaces with fractional kernels leads to

$\nabla_x\,\mathbb{E}[\Phi(X^x_T)] = C_H\,\mathbb{E}\left[ \Phi(X^x_T) \left( \int_0^T u^{-H-\frac12} \int_u^T a(s-u)(s-u)^{\frac12-H} s^{H-\frac12} (\nabla_x X^x_{s-u})^\top dB_s \right)^\top \right]$

where $C_H$ is an explicit constant depending on $H$ , and the stochastic integral reflects the non-local, non-semimartingale nature of $B^H$ (Amine et al., 2018, Coffie et al., 2021). The result applies to semilinear and mean-field SDEs with fractional drivers (Tahmasebi, 2022).

5. Extensions to Mean-Field, Distribution-Dependent, and Path-Dependent Dynamics

For McKean–Vlasov SDEs and semigroups $P_t f(\mu) = \mathbb{E}[f(X_t^\mu)]$ —with $X_t^\mu$ evolving under possibly singular drift depending on the law—the Lions derivative admits a BEL representation: $D^L P_t f(\mu)(v) = \mathbb{E}\left[ f(X_t^\mu) \int_0^t \langle H_s^{\mu,v}, dW_s\rangle \right]$ where $H_s^{\mu,v}$ explicitly encodes law-derivatives and the Malliavin structure (Ren et al., 2018, Huang et al., 2021, Bao et al., 2020). For path-dependent SDEs, analogous asymptotic BEL formulas can be constructed, as can generalizations for jump-driven/non-Markovian systems (Ren et al., 29 Dec 2025).

6. Applications: PDE Regularity, Optimal Control, Stochastic Sensitivities

The BEL formula provides gradient estimates and strong Feller properties for transition semigroups, including in degenerate and infinite-dimensional settings (e.g., SPDEs, stable-driven systems) (Zhang, 2012, Altman, 2017).
For Hamilton–Jacobi–Bellman (HJB) equations in optimal control, the formula yields non-anticipative stochastic gradient representations crucial for numerical schemes and policy iteration algorithms with no spatial mesh (Sanders et al., 2024).
In rough volatility modeling and finance, BEL weights are fundamental in calculating Greeks, especially when payoff functionals are non-differentiable or the underlying has law, path, or rough dependence (Baños, 2015, Chen et al., 2023).
On Riemannian manifolds and path space, BEL-type formulas underpin gradient bounds, log-Sobolev inequalities, coupling characterizations, and geometric analysis (Braun et al., 2020, Chen et al., 2023).

7. Summary Table: Core BEL Extensions

Setting	Form of BEL Formula	Key Technical Features
Classical SDE	$\nabla_x P_t f(x)=\mathbb{E}[f(X_t^x)\int_0^t H_s^x\,dW_s]$	Malliavin derivative, invertible diffusion, $L^q_tL^p_x$ drift
Jumps/Lévy	BEL splitting into Wiener & jump integrals	Dirichlet–Malliavin–Poisson calculus, lent-particle method
Riemannian manifold	$\nabla P_t f(x) = \mathbb{E}[f(X_t^x) \int_0^t Q_r^* dW_r / t]$	Damped parallel transport, Ricci curvature, Kato class
Fractional and rough SDEs	Integral against $dB^H_s$ with fractional weight	Volterra/Fractional calculus, singular drift, Skorokhod integral
Mean-field (McKean–Vlasov), distribution-dep.	Lions/intrinsic derivative by Malliavin weight	Law-derivatives, Zvonkin transform, singular coefficients

The Bismut–Elworthy–Li formula thus provides a foundational bridge between stochastic analysis and PDE theory, supporting high-dimensional Monte Carlo methods, PDE regularity theorems, modern rough and distribution-dependent dynamics, and geometric analysis across a diverse array of stochastic systems (Baños et al., 2015, Baños, 2015, Takeuchi, 2010, Braun et al., 2020, Amine et al., 2018, Sanders et al., 2024, Ren et al., 29 Dec 2025, Huang et al., 2021, Ren et al., 2018, Bao et al., 2020, Chen et al., 2023).