Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 96 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 35 tok/s
GPT-5 High 43 tok/s Pro
GPT-4o 106 tok/s
GPT OSS 120B 460 tok/s Pro
Kimi K2 228 tok/s Pro
2000 character limit reached

Approx. Bayesian Filter via BSDEs

Updated 16 August 2025
  • The paper introduces an approximate Bayesian filtering paradigm that reformulates the Fokker–Planck PDE as a nonlinear Feynman–Kac representation using backward stochastic differential equations.
  • It leverages deep BSDE solvers where neural networks approximate the backward process and its gradient, enabling scalable and efficient filtering in high-dimensional nonlinear settings.
  • Empirical results on Ornstein–Uhlenbeck and bistable processes validate mixed error bounds, demonstrating convergence rates aligned with Euler–Maruyama discretization requirements.

An approximate Bayesian filter based on backward stochastic differential equations (BSDEs) is a class of nonlinear filtering algorithms in which the evolution of the conditional (typically unnormalized) filtering density is represented via a nonlinear Feynman–Kac formula, leading to a forward–backward SDE system. By leveraging deep learning for the numerical approximation of the BSDE solutions, this framework achieves efficient and scalable nonlinear filtering, particularly in high-dimensional and nonlinear settings (Bågmark et al., 14 Aug 2025).

1. Nonlinear Feynman–Kac Representation of the Filtering Density

In the classical continuous–discrete filtering context, the state density between observations satisfies the Fokker–Planck (Kolmogorov forward) equation. The core methodological advance is to re-express this PDE as a backward stochastic differential equation, yielding a nonlinear probabilistic representation for the filtering density. For the time interval [tk,tk+1][t_k, t_{k+1}], the prediction step is reformulated as

pk(tk+1,x)=E[gk(Xtk+1k,x)+tktk+1f(Xsk,x,Ysk,x,Zsk,x)ds]p_k(t_{k+1}, x) = \mathbb{E} \left[ g_k(X_{t_{k+1}}^{k,x}) + \int_{t_k}^{t_{k+1}} f(X_s^{k,x}, Y_s^{k,x}, Z_s^{k,x})\,ds \right]

where Xk,xX^{k,x} is the forward process started at xx at time tkt_k, gkg_k encodes the updated density at the last observation, and ff entails terms derived from the drift, diffusion, and adjoint operator structure of the Fokker–Planck PDE. Crucially, the evolution of the density is recast as YY in a coupled FBSDE system:

  • Forward SDE: Xt=Xtk+tktμ(Xs)ds+tktσ(Xs)dWsX_t = X_{t_k} + \int_{t_k}^t \mu(X_s) ds + \int_{t_k}^t \sigma(X_s) dW_s,
  • Backward SDE:

Yt=gk(Xtk+1)+ttk+1f(Xs,Ys,Zs)dsttk+1ZsdWsY_t = g_k(X_{t_{k+1}}) + \int_t^{t_{k+1}} f(X_s, Y_s, Z_s) ds - \int_t^{t_{k+1}} Z_s dW_s

This probabilistic reformulation (nonlinear Feynman–Kac) is the foundation for subsequent numerical and machine learning approximations of the filter.

2. Deep BSDE Solver: Network-Based Approximation

The deep BSDE method brings neural networks to BSDE solution of the filtering density prediction step. The backward component Yt=u(t,Xt)Y_t = u(t,X_t) and its gradient in xx (corresponding to ZtZ_t) are approximated by neural networks (wθ,vnθ)(w^\theta, \overline{v}_n^\theta), with θ\theta denoting network weights.

The numerical implementation discretizes the SDE using Euler–Maruyama for the forward path and projects the backward recursion onto neural network bases. The optimization minimizes the expected squared difference between the backward process simulated terminal output and the known terminal value (from the density at the new measurement):

minwθ,{vnθ}EYNO1:kgk(XNk,O1:k)2\min_{w^\theta,\, \{ \overline{v}_n^\theta \}} \mathbb{E} \big| \mathcal{Y}^{O_{1:k}}_{N} - \overline{g}_k(\mathcal{X}_N^k, O_{1:k}) \big|^2

subject to the discrete-time dynamics: Yn+1O1:k=wθ(X0k,O1:k)=0n[f(Xk,YO1:k,σ(Xk)vθ(Xk,O1:k))Δtvθ(Xk,O1:k)σ(Xk)ΔW]\mathcal{Y}_{n+1}^{O_{1:k}} = w^\theta(\mathcal{X}_0^k, O_{1:k}) - \sum_{\ell=0}^n \Big[ f(\mathcal{X}_\ell^k, \mathcal{Y}_\ell^{O_{1:k}}, \sigma(\mathcal{X}_\ell^k)^\top \overline{v}_\ell^\theta (\mathcal{X}_\ell^k, O_{1:k})) \Delta t - \overline{v}_\ell^\theta (\mathcal{X}_\ell^k, O_{1:k})^\top \sigma(\mathcal{X}_\ell^k) \Delta W \Big]

The hierarchical structure allows the networks to absorb both state and observation sequence up to tkt_k, and effectively learn an adaptable density propagator.

3. Offline Training and Online Sequential Application

The methodology is divided between an offline training phase and online application.

Offline phase: Simulate many forward trajectories and corresponding observation histories. For each interval [tk,tk+1][t_k, t_{k+1}], train the networks to minimize terminal error for a range of starting points and observation sequences. All intensive computation occurs offline; network parameters representing the mappings wθw^\theta, vnθ\overline{v}_n^\theta are fixed after training.

Online phase: When presented with real data, use new observations oko_k to update the current density by multiplying the predicted density by the observation likelihood, then proceed to the next time interval using the pre-trained deep BSDE network. The update at time tkt_k is

p^k(x,o1:k)=wk1(x,o1:k1)L(ok,x)\widehat{p}_k(x, o_{1:k}) = w_{k-1}^*(x, o_{1:k-1})\, L(o_k, x)

where L(ok,x)L(o_k, x) is the likelihood and wk1w_{k-1}^* is the neural network output.

This structure enables rapid, real-time filtering with the computational effort front-loaded in the offline stage.

4. Error Analysis: Mixed A Priori–A Posteriori Bounds

A distinguishing feature of this BSDE-based filtering paradigm is the derived mixed error bound quantifying both time discretization and network approximation error. For each time step kk, the error in density is

pk(tk)p^kL(O;L(Rd;R))C(τ1/2+j=0K1supoOsupxRdEgj(XNj,x,o1:j)YNj,x2)1/2\| p_k(t_k) - \widehat{p}_k \|_{L^\infty(\mathbb{O}; L^\infty(\mathbb{R}^d;\mathbb{R}))} \leq C \left( \tau^{1/2} + \sum_{j=0}^{K-1} \sup_{o \in \mathbb{O}} \sup_{x \in \mathbb{R}^d} \mathbb{E} \left| \overline{g}_j(\mathcal{X}_N^{j,x}, o_{1:j}) - \mathcal{Y}_N^{j,x} \right|^2 \right)^{1/2}

where τ\tau is the time-discretization step and the expectation quantifies the residual network error over simulated terminal states. The bound is "mixed" because it is both a priori (from discretization) and a posteriori (from empirically realized neural network fitting accuracy). Under standard smoothness and ellipticity conditions, the theoretical convergence rate in time-discretization is O(τ1/2)O(\tau^{1/2}), matching known BSDE and Euler–Maruyama rates.

5. Numerical Illustration: Empirical Validation

Two example systems underscore the practical convergence and accuracy:

Ornstein–Uhlenbeck process: For μ(x)=x\mu(x) = -x with Gaussian observations, the benchmark is the Kalman filter. The pointwise and residual errors of the deep BSDE filter, evaluated both in time and over state, decrease as N1/2N^{-1/2} with increased discretization (NN time intervals), matching the theoretical predictions.

Bistable process: For the drift μ(x)=(2/5)(5xx3)\mu(x) = (2/5)(5x - x^3) (double-well), the system is fundamentally nonlinear, the density is bimodal, and no closed-form filter exists. The deep BSDE filter is compared to a high-resolution particle-KDE reference. The empirical convergence rate approaches N1/2N^{-1/2} until reaching the limit of neural network training or sampling error.

These experiments confirm the mixed error bound and demonstrate that, with adequate offline training, the approach robustly propagates complex, highly non-Gaussian densities through the filtering recursion.

6. Context and Implications

This filtering framework situates itself in a rapidly developing direction at the intersection of stochastic analysis, numerical methods for SDEs, and machine learning. Use of the nonlinear Feynman–Kac representation and deep BSDE solvers permits expressive, nonparametric approximations to filtering densities, scalable to high dimension and accommodating strong nonlinearities. The offline–online split provides efficiency when real-time filtering is required.

The method is closely related to advances in deep BSDE solvers for PDEs and FBSDEs, extending them to the recursive Bayesian filtering context by leveraging probabilistic representations of Kolmogorov and Fokker–Planck equations, and integrating neural network-based regression for unnormalized density approximation. The mixed a priori–a posteriori error analysis mirrors contemporary deep learning theory, combining classical numerical rates with function approximation contingency.

Empirical performance, as illustrated in both linear and nonlinear canonical filtering problems (Bågmark et al., 14 Aug 2025), supports the theoretical rates. The approach generalizes the propagation step in nonlinear filters, unifying PDE- and SDE-based perspectives and offering a competitive alternative to particle filters and kernel-based density estimation, especially where computational constraints favor offline–online architectures.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)