Papers
Topics
Authors
Recent
2000 character limit reached

Deep BSDE Solvers for High-Dimensional PDEs

Updated 19 January 2026
  • Deep BSDE solvers are numerical algorithms that combine deep learning, the Feynman–Kac representation, and time discretization to solve high-dimensional backward stochastic differential equations and nonlinear PDEs.
  • They employ neural network approximators and advanced schemes like Runge–Kutta and operator learning to parameterize solution operators and overcome the curse of dimensionality.
  • Recent advancements extend these methods to bounded domains, jump processes, and Volterra equations, supported by rigorous convergence theory and efficient implementation strategies.

Deep BSDE solvers are a class of numerical algorithms that leverage deep learning to approximate solutions to backward stochastic differential equations (BSDEs) and their associated high-dimensional partial differential equations (PDEs). BSDEs arise in nonlinear pricing, stochastic control, risk measurement, and dynamic programming, among other areas. Classical numerical schemes for BSDEs are often limited by the curse of dimensionality. Deep BSDE solvers fuse the probabilistic Feynman–Kac representation, time discretization, nonlinear least-squares objectives, and high-capacity neural network approximators to overcome these challenges. Recent research has generalized the methodology to bounded domains, jump processes, Volterra structures, operator learning, and hedging-sensitive architectures.

1. Formulation of BSDEs and Solution Operators

A standard BSDE on a probability space supporting a dd-dimensional Brownian motion BB and filtration {Ft}\{\mathcal{F}_t\} takes the form:

Yt=ξ+tTg(s,Ys,Zs)dstTZsdBs,t[0,T],Y_t = \xi + \int_t^T g(s, Y_s, Z_s)\,ds - \int_t^T Z_s\cdot dB_s,\quad t\in[0,T],

where ξL2(FT)\xi\in L^2(\mathcal{F}_T) is the terminal condition and g:[0,T]×R×RdRg:[0,T]\times\mathbb{R}\times\mathbb{R}^d\to\mathbb{R} is a Lipschitz generator. The solution (Y,Z)S2×H2(Y, Z)\in \mathcal{S}^2 \times \mathcal{H}^2 is adapted and unique under standard conditions. The solution operator Yt\mathcal{Y}_t maps terminal data to adapted values at time tt:

Yt(ξ)=Yt,Yt:L2(FT)L2(Ft).\mathcal{Y}_t(\xi) = Y_t,\quad \mathcal{Y}_t:L^2(\mathcal{F}_T)\to L^2(\mathcal{F}_t).

The initial operator S:ξY0\mathcal{S}:\xi\mapsto Y_0 is central to risk measurement and dynamic conditional expectations (Nunno et al., 2024).

2. Neural Parametrization and Discretization Schemes

Deep BSDE solvers employ a time discretization, typically implicit Euler or high-order Runge–Kutta (Chassagneux et al., 2022), over a grid 0=t0<<tn=T0=t_0<\cdots<t_n=T, mesh size π|\pi|. The forward SDE evolution delivers XtiX_{t_i}, while the backward recursion seeks

{Ynπ=ξ, Ziπ=1ΔtiEti[Yi+1π(Bti+1Bti)], Yiπ=Eti[Yi+1π]+Δtig(ti,Yiπ,Ziπ),\begin{cases} Y_n^{\pi} = \xi,\ Z_i^{\pi} = \frac{1}{\Delta t_i}\mathbb{E}_{t_i}[Y_{i+1}^{\pi}(B_{t_{i+1}} - B_{t_i})],\ Y_i^{\pi} = \mathbb{E}_{t_i}[Y_{i+1}^{\pi}] + \Delta t_i\, g(t_i, Y_i^{\pi}, Z_i^{\pi}), \end{cases}

with Eti\mathbb{E}_{t_i} denoting conditional expectation.

Rather than recalculate (Yiπ,Ziπ)(Y_i^{\pi}, Z_i^{\pi}) for every terminal ξ\xi, the method seeks to learn parameterized maps

Yiπ=Yiπ(Xti,enc(ξ)),Ziπ=Ziπ(Xti,enc(ξ)),Y_i^{\pi} = \mathcal{Y}_i^{\pi}(X_{t_i}, \mathrm{enc}(\xi)),\quad Z_i^{\pi} = \mathcal{Z}_i^{\pi}(X_{t_i}, \mathrm{enc}(\xi)),

where enc(ξ)\mathrm{enc}(\xi) encodes the terminal data, e.g., chaos coefficients (Nunno et al., 2024), Wiener chaos expansions or polynomial bases, depending on the solver architecture. Feed-forward neural networks (one or two hidden layers, ReLU, or other activations) are used to approximate these maps. In operator learning configurations, the input dimension scales with the chaos truncation size Np,MN_{p,M}.

Loss functions are constructed by stacking the discrete BSDE identities:

L({θi})=E(ω,ξ)i=0n1Yiπ(Xti,enc(ξ))Yi+1π(Xti+1,enc(ξ))Δtig()+Ziπ()ΔBi2,\mathcal{L}(\{\theta_i\}) = \mathbb{E}_{(\omega, \xi)} \sum_{i=0}^{n-1} \left|\mathcal{Y}_i^{\pi}(X_{t_i}, \mathrm{enc}(\xi)) - \mathcal{Y}_{i+1}^{\pi}(X_{t_{i+1}}, \mathrm{enc}(\xi)) - \Delta t_i\,g(\cdots) + \mathcal{Z}_i^{\pi}(\cdots)\cdot\Delta B_i\right|^2,

with terminal consistency enforced by Ynπ(XT,enc(ξ))enc(ξ)2|\mathcal{Y}_n^{\pi}(X_T, \mathrm{enc}(\xi)) - \mathrm{enc}(\xi)|^2. Training is performed via Adam or SGD with large mini-batches (e.g., B=5104B=5\cdot 10^4) (Nunno et al., 2024).

3. Advanced Architectures and Methodological Extensions

Recent work introduces the following advanced variants:

  • Operator Learning via Wiener Chaos Encoding: The Deep-Operator-BSDE method employs Wiener chaos decomposition to represent arbitrary terminal conditions, enabling the approximation of BSDE solution operators Yt\mathcal{Y}_t on L2(FT)L^2(\mathcal{F}_T) (Nunno et al., 2024).
  • Runge–Kutta and Crank–Nicolson Schemes: Multi-stage deep learning-based schemes are shown to improve discrete-time error rates, with Crank–Nicolson mediating the best trade-off between accuracy and cost (Chassagneux et al., 2022).
  • Multi-step Local Quadratic Losses: Global optimization with locally additive losses that recursively reference the terminal condition improves both accuracy and landscape exploration for SGD in high-dimensional cases (LaDBSDE) (Kapllani et al., 2020, Bussell et al., 2023).
  • Barrier Options via Brownian Bridge Weights: Encoding boundary conditions into modified terminal payoffs through Brownian bridge theory allows standard deep BSDE architectures to address boundary-value problems (Yu et al., 2019).
  • Volterra (BSVIE) Extensions: The DeepBSDE framework generalizes to backward stochastic Volterra integral equations by joint parametrization over two time indices and nested neural networks (Agram et al., 2 Jul 2025).
  • Jump Dynamics and PIDEs: BSDE solvers for FBSDEs with Lévy jumps utilize neural networks to approximate both diffusion and jump compensator terms, incorporating error decomposition for finite and infinite activity cases (Gnoatto et al., 16 Jan 2025, Andersson et al., 2022).
  • Genetic Initialization and Control-Variate Approaches: Genetic algorithms for initial parameter 'shooting' yield faster convergence than naive random search, and linear asymptotic expansions as control variates dramatically reduce both statistical and discretization errors, especially in high-dimension (Putri et al., 2023, Takahashi et al., 2021).
  • Signature-RDE, XNet, and Kolmogorov–Arnold Networks: Advanced architectures, including log-signature sequence representation and neural rough differential equations (Alzahrani, 12 Oct 2025), rational activation functions (Zheng et al., 10 Feb 2025), and learnable B-spline activations (Handal et al., 16 Jan 2026), improve approximation and tail risk estimation in hedging contexts.

4. Convergence Theory and Error Analysis

Rigorous convergence guarantees support the practical use of deep BSDE solvers in nonlinear, high-dimensional domains:

  • Posterior Error Estimates: Under Lipschitz and regularity conditions, controlling the terminal loss yields full pathwise control of YtY^t2+ZtZ^t2|Y_t - \hat Y_t|^2 + |Z_t - \hat Z_t|^2, with constants independent of the dimension (Han et al., 2018, Jiang et al., 2021).
  • Discrete-Time and Network Approximation Rates: With mesh size π|\pi| and network best-approximation errors ϵn\epsilon_n, composite error bounds take the form

maxiEYtiYiπ2+iΔtiEZtiZiπ2C(π+iϵn),\max_{i} \mathbb{E}|Y_{t_i} - Y_i^{\pi}|^2 + \sum_i \Delta t_i\,\mathbb{E}|Z_{t_i} - Z_i^{\pi}|^2 \leq C(|\pi| + \sum_{i}\epsilon_n),

with O(π1/2)O(|\pi|^{1/2}) possible under Malliavin differentiability (Nunno et al., 2024, Chassagneux et al., 2022).

  • Universal Approximation and Multi-step Losses: Multi-step losses and rational-activation architectures (XNet) realize faster decay of approximation error with respect to network width, mitigating the O(L2)O(L^2) scaling of standard feed-forward nets (Zheng et al., 10 Feb 2025). For operator learning, universal approximation in C2C^2 under sufficient depth controls the error on bounded domains (Würschmidt, 19 Aug 2025).
  • Volterra and Reflected Structures: Nested convergence and measurability arguments support two-index BSVIE extensions. Reflected BSDEs are handled by explicit projection onto feasible regions (Agram et al., 2 Jul 2025).

5. Practical Implementation, Hyperparameters, and Efficiency

Implementation details are strongly architecture-dependent:

Method Class Batch Size Activation Epochs Notable Parameters
Operator BSDE (Nunno et al., 2024) 51045\cdot10^4 ReLU 100–200 Chaos order pp, grid points
LaDBSDE (Kapllani et al., 2020) 41034\cdot10^3 tanh adaptive Shared net, AD for ZZ
DADM (Bussell et al., 2023) 10310^3 ReLU/smooth 5,000+ Weight constraints
XNet (Zheng et al., 10 Feb 2025) 10310^3 Cauchy/Rat. 10,000 Basis width L=100L=100–$200$
Signature-RDE (Alzahrani, 12 Oct 2025) 11031\cdot10^3 log-signature, RDE variable Signature depth mm, RDE width

Algorithms generally initialize each time-layer neural net with one hidden layer of width scaling with input dimension (e.g., H=3Np,MH=3 N_{p, M} for Operator-BSDE), use Adam with decaying learning rates, and leverage automatic differentiation for ZZ and higher-order derivatives. Network widths, chaos truncation, and time-step mesh are chosen to balance computational cost and discretization error. Large batch sizes and robust learning-rate schedules are essential for stability, especially in high dimension.

6. Applications and Numerical Results

Deep BSDE solvers are employed in:

  • Nonlinear pricing of derivatives, risk adjustment, CVA, XVA: Standard and control-variated schemes deliver sub-1%1\% errors in dimension d=100d=100, with substantial speed-ups for hybrid/control-variated methods (Takahashi et al., 2021).
  • Exotic options (barrier, American, Bermudan): Barrier conditions enforced via Brownian bridge or explicit reflection are tractable up to moderate dd; implementation times are not exponentially dependent on dimension (Yu et al., 2019, Wang et al., 2018).
  • Portfolio and Utility Optimization: Deep signature and neural RDE solvers enable tail-sensitive control in fully nonlinear settings. Empirical results show improved conditional value-at-risk (CVaR) for risk management (Alzahrani, 12 Oct 2025).
  • Path-dependent and Volterra problems: Deep BSVIE parameterizations generalize to time-inconsistent or recursive-memory control (Agram et al., 2 Jul 2025).
  • Jump process and PIDE frameworks: Decoupled and coupled jump systems are efficiently learned, with error rates consistent with diffusion-only theory given appropriate truncation (Gnoatto et al., 16 Jan 2025, Andersson et al., 2022).
  • Bounded domain problems: Loss-modification and weighted penalty analyses yield convergence results for random-horizon and boundary-value applications (Würschmidt, 19 Aug 2025).

7. Limitations, Outlook, and Theoretical Insights

Several limitations and areas for future development are noted:

  • Network and Optimization Error Trade-off: Classical feed-forward architectures exhibit O(L2)O(L^2) parameter scaling for desired approximation error. Rational/Cauchy-kernel (Zheng et al., 10 Feb 2025), B-spline KAN (Handal et al., 16 Jan 2026), and signature-based networks (Alzahrani, 12 Oct 2025) achieve linear or near-linear parameter growth for a given error, facilitating scalability.
  • Multistep and Local Losses: Multi-step schemes (DADM) and locally additive objectives mitigate poor minima and favor global consistency, especially for long maturities and non-smooth drivers (Kapllani et al., 2020, Bussell et al., 2023).
  • Boundary, Jump, and Volterra Generalizations: Direct neural approximation is feasible on random or path-dependent domains, provided measurability and universal approximation theorems hold.
  • Convergence Theory: Posterior estimates guarantee accuracy contingent on terminal loss minimization, network expressivity, and mesh refinement (Han et al., 2018, Jiang et al., 2021, Nunno et al., 2024), but theoretical rates may degrade if drivers are strongly nonlinear or the diffusion is ill-behaved.
  • Operator Learning: Recent advances demonstrate efficient solution operator approximation for classes of terminal conditions. This extends deep BSDE methodology to conditional expectation and dynamic risk measurement in abstract spaces (Nunno et al., 2024).

Further progress may emerge from adaptive mesh strategies, cross-validation for network complexity, and hybrid algorithmic blends—control variates, genetic initialization, and hierarchical sequence encoding—all of which improve empirical convergence rates, lower error, and reduce computational resources within high-dimensional PDE/BSDE applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Deep BSDE Solvers.