Deep BSDE Solvers for High-Dimensional PDEs

Updated 19 January 2026

Deep BSDE solvers are numerical algorithms that combine deep learning, the Feynman–Kac representation, and time discretization to solve high-dimensional backward stochastic differential equations and nonlinear PDEs.
They employ neural network approximators and advanced schemes like Runge–Kutta and operator learning to parameterize solution operators and overcome the curse of dimensionality.
Recent advancements extend these methods to bounded domains, jump processes, and Volterra equations, supported by rigorous convergence theory and efficient implementation strategies.

Deep BSDE solvers are a class of numerical algorithms that leverage deep learning to approximate solutions to backward stochastic differential equations (BSDEs) and their associated high-dimensional partial differential equations (PDEs). BSDEs arise in nonlinear pricing, stochastic control, risk measurement, and dynamic programming, among other areas. Classical numerical schemes for BSDEs are often limited by the curse of dimensionality. Deep BSDE solvers fuse the probabilistic Feynman–Kac representation, time discretization, nonlinear least-squares objectives, and high-capacity neural network approximators to overcome these challenges. Recent research has generalized the methodology to bounded domains, jump processes, Volterra structures, operator learning, and hedging-sensitive architectures.

1. Formulation of BSDEs and Solution Operators

A standard BSDE on a probability space supporting a $d$ -dimensional Brownian motion $B$ and filtration $\{\mathcal{F}_t\}$ takes the form:

$Y_t = \xi + \int_t^T g(s, Y_s, Z_s)\,ds - \int_t^T Z_s\cdot dB_s,\quad t\in[0,T],$

where $\xi\in L^2(\mathcal{F}_T)$ is the terminal condition and $g:[0,T]\times\mathbb{R}\times\mathbb{R}^d\to\mathbb{R}$ is a Lipschitz generator. The solution $(Y, Z)\in \mathcal{S}^2 \times \mathcal{H}^2$ is adapted and unique under standard conditions. The solution operator $\mathcal{Y}_t$ maps terminal data to adapted values at time $t$ :

$\mathcal{Y}_t(\xi) = Y_t,\quad \mathcal{Y}_t:L^2(\mathcal{F}_T)\to L^2(\mathcal{F}_t).$

The initial operator $\mathcal{S}:\xi\mapsto Y_0$ is central to risk measurement and dynamic conditional expectations (Nunno et al., 2024).

2. Neural Parametrization and Discretization Schemes

Deep BSDE solvers employ a time discretization, typically implicit Euler or high-order Runge–Kutta (Chassagneux et al., 2022), over a grid $0=t_0<\cdots<t_n=T$ , mesh size $|\pi|$ . The forward SDE evolution delivers $X_{t_i}$ , while the backward recursion seeks

$\begin{cases} Y_n^{\pi} = \xi,\ Z_i^{\pi} = \frac{1}{\Delta t_i}\mathbb{E}_{t_i}[Y_{i+1}^{\pi}(B_{t_{i+1}} - B_{t_i})],\ Y_i^{\pi} = \mathbb{E}_{t_i}[Y_{i+1}^{\pi}] + \Delta t_i\, g(t_i, Y_i^{\pi}, Z_i^{\pi}), \end{cases}$

with $\mathbb{E}_{t_i}$ denoting conditional expectation.

Rather than recalculate $(Y_i^{\pi}, Z_i^{\pi})$ for every terminal $\xi$ , the method seeks to learn parameterized maps

$Y_i^{\pi} = \mathcal{Y}_i^{\pi}(X_{t_i}, \mathrm{enc}(\xi)),\quad Z_i^{\pi} = \mathcal{Z}_i^{\pi}(X_{t_i}, \mathrm{enc}(\xi)),$

where $\mathrm{enc}(\xi)$ encodes the terminal data, e.g., chaos coefficients (Nunno et al., 2024), Wiener chaos expansions or polynomial bases, depending on the solver architecture. Feed-forward neural networks (one or two hidden layers, ReLU, or other activations) are used to approximate these maps. In operator learning configurations, the input dimension scales with the chaos truncation size $N_{p,M}$ .

Loss functions are constructed by stacking the discrete BSDE identities:

$\mathcal{L}(\{\theta_i\}) = \mathbb{E}_{(\omega, \xi)} \sum_{i=0}^{n-1} \left|\mathcal{Y}_i^{\pi}(X_{t_i}, \mathrm{enc}(\xi)) - \mathcal{Y}_{i+1}^{\pi}(X_{t_{i+1}}, \mathrm{enc}(\xi)) - \Delta t_i\,g(\cdots) + \mathcal{Z}_i^{\pi}(\cdots)\cdot\Delta B_i\right|^2,$

with terminal consistency enforced by $|\mathcal{Y}_n^{\pi}(X_T, \mathrm{enc}(\xi)) - \mathrm{enc}(\xi)|^2$ . Training is performed via Adam or SGD with large mini-batches (e.g., $B=5\cdot 10^4$ ) (Nunno et al., 2024).

3. Advanced Architectures and Methodological Extensions

Recent work introduces the following advanced variants:

Operator Learning via Wiener Chaos Encoding: The Deep-Operator-BSDE method employs Wiener chaos decomposition to represent arbitrary terminal conditions, enabling the approximation of BSDE solution operators $\mathcal{Y}_t$ on $L^2(\mathcal{F}_T)$ (Nunno et al., 2024).
Runge–Kutta and Crank–Nicolson Schemes: Multi-stage deep learning-based schemes are shown to improve discrete-time error rates, with Crank–Nicolson mediating the best trade-off between accuracy and cost (Chassagneux et al., 2022).
Multi-step Local Quadratic Losses: Global optimization with locally additive losses that recursively reference the terminal condition improves both accuracy and landscape exploration for SGD in high-dimensional cases (LaDBSDE) (Kapllani et al., 2020, Bussell et al., 2023).
Barrier Options via Brownian Bridge Weights: Encoding boundary conditions into modified terminal payoffs through Brownian bridge theory allows standard deep BSDE architectures to address boundary-value problems (Yu et al., 2019).
Volterra (BSVIE) Extensions: The DeepBSDE framework generalizes to backward stochastic Volterra integral equations by joint parametrization over two time indices and nested neural networks (Agram et al., 2 Jul 2025).
Jump Dynamics and PIDEs: BSDE solvers for FBSDEs with Lévy jumps utilize neural networks to approximate both diffusion and jump compensator terms, incorporating error decomposition for finite and infinite activity cases (Gnoatto et al., 16 Jan 2025, Andersson et al., 2022).
Genetic Initialization and Control-Variate Approaches: Genetic algorithms for initial parameter 'shooting' yield faster convergence than naive random search, and linear asymptotic expansions as control variates dramatically reduce both statistical and discretization errors, especially in high-dimension (Putri et al., 2023, Takahashi et al., 2021).
Signature-RDE, XNet, and Kolmogorov–Arnold Networks: Advanced architectures, including log-signature sequence representation and neural rough differential equations (Alzahrani, 12 Oct 2025), rational activation functions (Zheng et al., 10 Feb 2025), and learnable B-spline activations (Handal et al., 16 Jan 2026), improve approximation and tail risk estimation in hedging contexts.

4. Convergence Theory and Error Analysis

Rigorous convergence guarantees support the practical use of deep BSDE solvers in nonlinear, high-dimensional domains:

Posterior Error Estimates: Under Lipschitz and regularity conditions, controlling the terminal loss yields full pathwise control of $|Y_t - \hat Y_t|^2 + |Z_t - \hat Z_t|^2$ , with constants independent of the dimension (Han et al., 2018, Jiang et al., 2021).
Discrete-Time and Network Approximation Rates: With mesh size $|\pi|$ and network best-approximation errors $\epsilon_n$ , composite error bounds take the form

$\max_{i} \mathbb{E}|Y_{t_i} - Y_i^{\pi}|^2 + \sum_i \Delta t_i\,\mathbb{E}|Z_{t_i} - Z_i^{\pi}|^2 \leq C(|\pi| + \sum_{i}\epsilon_n),$

with $O(|\pi|^{1/2})$ possible under Malliavin differentiability (Nunno et al., 2024, Chassagneux et al., 2022).

Universal Approximation and Multi-step Losses: Multi-step losses and rational-activation architectures (XNet) realize faster decay of approximation error with respect to network width, mitigating the $O(L^2)$ scaling of standard feed-forward nets (Zheng et al., 10 Feb 2025). For operator learning, universal approximation in $C^2$ under sufficient depth controls the error on bounded domains (Würschmidt, 19 Aug 2025).
Volterra and Reflected Structures: Nested convergence and measurability arguments support two-index BSVIE extensions. Reflected BSDEs are handled by explicit projection onto feasible regions (Agram et al., 2 Jul 2025).

5. Practical Implementation, Hyperparameters, and Efficiency

Implementation details are strongly architecture-dependent:

Method Class	Batch Size	Activation	Epochs	Notable Parameters
Operator BSDE (Nunno et al., 2024)	$5\cdot10^4$	ReLU	100–200	Chaos order $p$ , grid points
LaDBSDE (Kapllani et al., 2020)	$4\cdot10^3$	tanh	adaptive	Shared net, AD for $Z$
DADM (Bussell et al., 2023)	$10^3$	ReLU/smooth	5,000+	Weight constraints
XNet (Zheng et al., 10 Feb 2025)	$10^3$	Cauchy/Rat.	10,000	Basis width $L=100$ –$200$
Signature-RDE (Alzahrani, 12 Oct 2025)	$1\cdot10^3$	log-signature, RDE	variable	Signature depth $m$ , RDE width

Algorithms generally initialize each time-layer neural net with one hidden layer of width scaling with input dimension (e.g., $H=3 N_{p, M}$ for Operator-BSDE), use Adam with decaying learning rates, and leverage automatic differentiation for $Z$ and higher-order derivatives. Network widths, chaos truncation, and time-step mesh are chosen to balance computational cost and discretization error. Large batch sizes and robust learning-rate schedules are essential for stability, especially in high dimension.

6. Applications and Numerical Results

Deep BSDE solvers are employed in:

Nonlinear pricing of derivatives, risk adjustment, CVA, XVA: Standard and control-variated schemes deliver sub- $1\%$ errors in dimension $d=100$ , with substantial speed-ups for hybrid/control-variated methods (Takahashi et al., 2021).
Exotic options (barrier, American, Bermudan): Barrier conditions enforced via Brownian bridge or explicit reflection are tractable up to moderate $d$ ; implementation times are not exponentially dependent on dimension (Yu et al., 2019, Wang et al., 2018).
Portfolio and Utility Optimization: Deep signature and neural RDE solvers enable tail-sensitive control in fully nonlinear settings. Empirical results show improved conditional value-at-risk (CVaR) for risk management (Alzahrani, 12 Oct 2025).
Path-dependent and Volterra problems: Deep BSVIE parameterizations generalize to time-inconsistent or recursive-memory control (Agram et al., 2 Jul 2025).
Jump process and PIDE frameworks: Decoupled and coupled jump systems are efficiently learned, with error rates consistent with diffusion-only theory given appropriate truncation (Gnoatto et al., 16 Jan 2025, Andersson et al., 2022).
Bounded domain problems: Loss-modification and weighted penalty analyses yield convergence results for random-horizon and boundary-value applications (Würschmidt, 19 Aug 2025).

7. Limitations, Outlook, and Theoretical Insights

Several limitations and areas for future development are noted:

Network and Optimization Error Trade-off: Classical feed-forward architectures exhibit $O(L^2)$ parameter scaling for desired approximation error. Rational/Cauchy-kernel (Zheng et al., 10 Feb 2025), B-spline KAN (Handal et al., 16 Jan 2026), and signature-based networks (Alzahrani, 12 Oct 2025) achieve linear or near-linear parameter growth for a given error, facilitating scalability.
Multistep and Local Losses: Multi-step schemes (DADM) and locally additive objectives mitigate poor minima and favor global consistency, especially for long maturities and non-smooth drivers (Kapllani et al., 2020, Bussell et al., 2023).
Boundary, Jump, and Volterra Generalizations: Direct neural approximation is feasible on random or path-dependent domains, provided measurability and universal approximation theorems hold.
Convergence Theory: Posterior estimates guarantee accuracy contingent on terminal loss minimization, network expressivity, and mesh refinement (Han et al., 2018, Jiang et al., 2021, Nunno et al., 2024), but theoretical rates may degrade if drivers are strongly nonlinear or the diffusion is ill-behaved.
Operator Learning: Recent advances demonstrate efficient solution operator approximation for classes of terminal conditions. This extends deep BSDE methodology to conditional expectation and dynamic risk measurement in abstract spaces (Nunno et al., 2024).

Further progress may emerge from adaptive mesh strategies, cross-validation for network complexity, and hybrid algorithmic blends—control variates, genetic initialization, and hierarchical sequence encoding—all of which improve empirical convergence rates, lower error, and reduce computational resources within high-dimensional PDE/BSDE applications.

Markdown Upgrade to Chat

References (17)

Deep Operator BSDE: a Numerical Scheme to Approximate the Solution Operators (2024)

Deep Runge-Kutta schemes for BSDEs (2022)

Deep learning algorithms for solving high dimensional nonlinear backward stochastic differential equations (2020)

Deep multi-step mixed algorithm for high dimensional non-linear PDEs and associated BSDEs (2023)

Deep-learning based numerical BSDE method for barrier options (2019)

Deep BSVIEs Parametrization and Learning-Based Applications (2025)

Convergence of a Deep BSDE solver with jumps (2025)

A deep solver for BSDEs with jumps (2022)

A Deep-Genetic Algorithm (Deep-GA) Approach for High-Dimensional Nonlinear Parabolic Partial Differential Equations (2023)

10.

A new efficient approximation scheme for solving high-dimensional semilinear PDEs: control variate method for Deep BSDE solver (2021)

11.

Deep Signature and Neural RDE Methods for Path-Dependent Portfolio Optimization (2025)

12.

XNet-Enhanced Deep BSDE Method and Numerical Analysis (2025)

13.

KANHedge: Efficient Hedging of High-Dimensional Options Using Kolmogorov-Arnold Network-Based BSDE Solver (2026)

14.

Convergence of the Deep BSDE Method for Coupled FBSDEs (2018)

15.

Convergence of the Deep BSDE method for FBSDEs with non-Lipschitz coefficients (2021)

16.

Deep BSDE Solver on Bounded Domains Part I: General Loss Rate (2025)

17.

Deep Learning-Based BSDE Solver for Libor Market Model with Application to Bermudan Swaption Pricing and Hedging (2018)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Deep BSDE Solvers.