Deep BSDE Method: Differential Learning

Updated 3 February 2026

Deep BSDE method is a computational framework that uses deep neural networks and Malliavin calculus to approximate solutions and derivatives of high-dimensional BSDE and corresponding PDEs.
The approach employs a differential learning architecture with separate networks for Y, Z, and Γ, ensuring joint optimization of value, gradient, and Hessian estimates.
Numerical results show 1–2 orders lower error, significantly improved convergence, and reduced runtimes compared to classical methods in high-dimensional applications.

A deep backward stochastic differential equation (BSDE) method is a class of algorithms that leverage deep neural networks to approximate solutions and their derivatives of high-dimensional nonlinear BSDEs, which are tightly coupled to parabolic partial differential equations (PDEs) via nonlinear Feynman–Kac formulae. The Deep BSDE paradigm is central to modern computational mathematics, mathematical finance, and stochastic control, due to its tractability in hundreds of dimensions and compatibility with Monte Carlo simulation. The "Deep BSDE method" encompasses a broad family of strategies, including the differential-learning techniques described below, which systematically utilize both values and pathwise derivatives of the BSDE process.

1. BSDE Formulation and Malliavin-Lifted System

Consider the decoupled forward–backward SDE system over $[0,T]$ : $\begin{aligned} &X_t = x_0 + \int_0^t a(s, X_s)ds + \int_0^t b(s, X_s)dW_s, \ &Y_t = g(X_T) + \int_t^T f(s, X_s, Y_s, Z_s)ds - \int_t^T Z_s dW_s, \end{aligned}$ where $Y_t=u(t,X_t)$ is the solution field and $Z_t=\nabla_x u(t,X_t)\cdot b(t,X_t)$ its spatial gradient contracted with the volatility coefficient. To ensure that both $Z_t$ and the Hessian $\Gamma_t = \nabla_x(\nabla_x u(t,x)\cdot b(t, x))|_{x = X_t}$ are amenable to network-based learning, the methodology leverages the Malliavin calculus: when one Malliavin-differentiates the backward SDE, an associated Malliavin-lifted system emerges: $\begin{aligned} D_s X_t &= \mathbb{1}_{s \le t}\left[b(s, X_s) + \int_s^t \nabla_x a(r, X_r) D_s X_r dr + \int_s^t \nabla_x b(r, X_r) D_s X_r dW_r\right], \ D_s Y_t &= \mathbb{1}_{s \le t}\left[\nabla_x g(X_T) D_s X_T + \int_t^T (\nabla_x f \cdot D_s X_r + \nabla_y f \cdot D_s Y_r + \nabla_z f \cdot D_s Z_r) dr - \int_t^T D_s Z_r dW_r\right], \end{aligned}$ with the pathwise identity $D_t Y_t = Z_t$ and $D_s Z_t = \Gamma_t D_s X_t$ (Kapllani et al., 2024).

This formulation makes the infinitesimal dynamics of all relevant sensitivities (value, first and second spatial derivatives) explicit and available for direct optimization during network training.

2. Discretization and Regression Equations

The continuous system is discretized via Euler-Maruyama on a uniform grid $0 = t_0 < \dots < t_N = T$ , $\Delta t = t_{n+1} - t_n$ : $\begin{aligned} &X_{n+1}^\Delta = X_n^\Delta + a(t_n, X_n^\Delta)\Delta t + b(t_n, X_n^\Delta)\Delta W_n,\ &Y_n^\Delta = Y_{n+1}^\Delta + f(t_n, X_n^\Delta, Y_n^\Delta, Z_n^\Delta)\Delta t - Z_n^\Delta \Delta W_n,\ &D_n X_m^\Delta = \ldots\ &D_n Y_n^\Delta = D_n Y_{n+1}^\Delta + f_D(t_n, \cdots) \Delta t - D_n Z_n^\Delta \Delta W_n,\ &D_n Y_n^\Delta = Z_n^\Delta, \quad D_n Y_{n+1}^\Delta = \nabla_x u(t_{n+1}, X_{n+1}^\Delta) \cdot D_n X_{n+1}^\Delta,\ &D_n Z_n^\Delta = \Gamma_n^\Delta D_n X_n^\Delta, \end{aligned}$ with $f_D$ denoting the appropriate Malliavin-weighted differential of the driver (Kapllani et al., 2024).

The necessity to jointly simulate the processes $(Y, Z, \Gamma)$ and their discrete Malliavin increments underlines the challenge: standard Deep BSDE architectures parameterizing only $Z$ are insufficient for high-fidelity derivative estimation at $t > 0$ .

3. Differential Deep Learning Architecture

Three independent feed-forward neural networks are introduced: $\begin{aligned} &\phi^y(t, x;\theta^y) \approx Y_n^\Delta,\ &\phi^z(t, x;\theta^z) \approx Z_n^\Delta,\ &\phi^\gamma(t, x;\theta^\gamma) \approx \Gamma_n^\Delta, \end{aligned}$ with input $(t, x) \in \mathbb{R} \times \mathbb{R}^d$ and outputs of shape $\mathbb{R}$ , $\mathbb{R}^{1\times d}$ , and $\mathbb{R}^{d\times d}$ , respectively—selected for joint approximation of function value, gradient, and Hessian. A typical choice is $L=4$ layers, $\eta=100+d$ neurons per layer, with $\tanh$ activation, for $\mathcal{O}(L\eta^2)$ parameters.

This architectural split is in contrast to classical Deep BSDE, which typically parameterizes only the control $Z$ and the initial value $Y_0$ through individual networks. Here, the presence and role of $\Gamma$ is explicit and critical, especially for HJB and other fully nonlinear drivers.

4. Differential-Learning Joint Loss Function

A global joint loss function is constructed by combining two local per-step residuals:

The $Y$ -loss, which enforces the discrete backward SDE increment:

$L^{y,\Delta}(\theta) = \mathbb{E}\left[\sum_{n=0}^{N-1} |Y_{n+1}^{\Delta,\theta} - Y_n^{\Delta,\theta} + f(t_n, \ldots)\Delta t - Z_n^{\Delta,\theta}\Delta W_n|^2 + |Y_N^{\Delta,\theta} - g(X_N^\Delta)|^2 \right],$

The $Z$ -loss, which enforces the Malliavin-increment equation (including pathwise derivatives):

$L^{z,\Delta}(\theta) = \mathbb{E}\left[\sum_{n=0}^{N-1} |D_n Y_{n+1}^{\Delta,\theta} - Z_n^{\Delta,\theta} + f_D(t_n, \ldots)\Delta t - \Gamma_n^{\Delta,\theta} D_n X_n^\Delta \Delta W_n|^2 + |Z_N^{\Delta,\theta} - \nabla_x g(X_N^\Delta)b(T,X_N^\Delta)|^2\right].$

A convex combination,

$L^\Delta(\theta) = \omega_1 L^{y,\Delta}(\theta) + \omega_2 L^{z,\Delta}(\theta), \quad \omega_1 = \frac{1}{d+1}, \ \omega_2 = \frac{d}{d+1},$

is minimized. This structure enforces both the pathwise evolution and Malliavin derivative constraints at every time step, dramatically increasing the accuracy of $Z$ and particularly $\Gamma$ as compared to previous local or terminal-only loss designs (Kapllani et al., 2024).

5. Training Procedure and Implementation

Training is performed by global stochastic optimization (typically Adam) on the joint loss. The input is normalized per time-slice; parameter initialization uses Xavier or He schemes compatible with modern frameworks. Training proceeds across $K=60\,000$ steps with batch size $128$.

Gradient computation is handled by automatic differentiation through the (deterministic) Euler–Maruyama simulation and the forward network evaluation, requiring no manual intervention even as the Malliavin-system dynamics are included in the loss.

The only non-trivial implementation aspect is correct tracking of discrete Malliavin derivatives $D_n X^\Delta$ for the $Z$ -loss, but otherwise code mirrors standard Deep BSDE approaches. All computations remain Monte Carlo, feed-forward, and backpropagation, with complexity linear in both spatial dimension $d$ , grid size $N$ , and network size.

6. Numerical Performance and Comparison with Classical Deep BSDE

On benchmark problems up to $d=50$ (toy nonlinear drivers, high-dimensional Black–Scholes basket options, and HJB control equations), the forward differential deep-learning scheme achieves:

$Y$ and $Z$ errors $1$–$2$ orders of magnitude lower than deep BSDE methods lacking derivative supervision,
$\Gamma$ error reduced from $\mathcal{O}(1)$ to $10^{-2}$ – $10^{-3}$ ,
Wall-clock runtime $2$– $5\times$ smaller than classical methods when $\Gamma$ is approximated via autograd,
Empirical convergence rates $\beta \approx 1.0$ –$1.7$, compared to nearly zero or negative rates for classical methods, for $Y$ , $Z$ error as a function of refinement (Kapllani et al., 2024).

This improvement is a consequence of imposing the joint dynamics of $(Y,Z,\Gamma)$ throughout the time grid—contrary to classical Deep BSDE, which parameterizes $Z$ only, matches primarily at the terminal condition, and typically yields poor intermediate-time derivative estimates.

7. Broader Context and Extensions

The forward differential Deep BSDE method represents a significant refinement of the original Deep BSDE paradigm (Han et al., 7 May 2025). It adopts the Malliavin-lifting principle to ensure that every relevant pathwise sensitivity is explicitly parameterized and learned. This not only yields more accurate solution fields and derivatives, but is also robust and scalable in high dimension.

Unlike other extensions (control variate schemes, locally additive losses, XNet/Cauchy architectures, or pathwise/rough signature enrichments), the forward differential approach ensures direct, simultaneous training of $Y$ , $Z$ , and $\Gamma$ . This is especially pertinent for financial applications (where second-order greeks are critical) and for HJB-type semilinear and fully nonlinear PDEs.

The approach is compatible with further architectural improvements (e.g., Cauchy basis, transformer-style attention, hybrid PINN losses), but its principal distinguishing feature is the explicit and joint regression of the Malliavin-lifted triple. The method can be regarded as a natural foundation for future algorithmic development in high-dimensional stochastic PDE/B PDE solvers.

References:

(Kapllani et al., 2024, Han et al., 7 May 2025)

Markdown Upgrade to Chat

References (2)

A forward differential deep learning-based algorithm for solving high-dimensional nonlinear backward stochastic differential equations (2024)

A brief review of the Deep BSDE method for solving high-dimensional partial differential equations (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Deep BSDE Method.

Deep BSDE Method: Differential Learning

1. BSDE Formulation and Malliavin-Lifted System

2. Discretization and Regression Equations

3. Differential Deep Learning Architecture

4. Differential-Learning Joint Loss Function

5. Training Procedure and Implementation

6. Numerical Performance and Comparison with Classical Deep BSDE

7. Broader Context and Extensions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Deep BSDE Method: Differential Learning

1. BSDE Formulation and Malliavin-Lifted System

2. Discretization and Regression Equations

3. Differential Deep Learning Architecture

4. Differential-Learning Joint Loss Function

5. Training Procedure and Implementation

6. Numerical Performance and Comparison with Classical Deep BSDE

7. Broader Context and Extensions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research