Papers
Topics
Authors
Recent
2000 character limit reached

Deep BSDE Filter

Updated 17 November 2025
  • Deep BSDE Filter is an approximate Bayesian nonlinear filtering method that reformulates the filtering problem using backward stochastic differential equations and deep neural networks.
  • It employs a nonlinear Feynman–Kac representation to derive rigorous error bounds and achieves O(Δt^(1/2)) convergence through controlled time discretization.
  • Practical implementations on test cases like the Ornstein–Uhlenbeck process and bistable drift models demonstrate mesh-free performance and rapid online inference.

The Deep BSDE Filter is an approximate Bayesian nonlinear filtering method based on backward stochastic differential equations (BSDEs). It reframes the evolution of conditional filtering densities in terms of a nonlinear Feynman–Kac representation and leverages deep learning—specifically, neural networks trained with deep BSDE approaches—for approximating these densities. The core advantages include the use of offline training for rapid online inference, preservation of a rigorous error bound, and the potential to remain mesh-free in higher dimensions.

1. Nonlinear Filtering and the Zakai Equation

Nonlinear filtering concerns estimating the conditional probability density p(t,xO1:k)p(t, x \mid O_{1:k}) of a hidden signal StS_t that evolves according to a stochastic differential equation (SDE): dSt=μ(St)dt+σ(St)dBt,S0π0(x),dS_t = \mu(S_t)\,dt + \sigma(S_t)\,dB_t,\quad S_0 \sim \pi_0(x), where BtB_t is a Brownian motion, μ\mu and σ\sigma are the drift and diffusion coefficients, and π0\pi_0 is the initial law. Observations are received at discrete times tkt_k: Ok=h(Stk)+Vk,O_k = h(S_{t_k}) + V_k, with VkV_k independent Gaussian noise. The unnormalized conditional density satisfies the Zakai equation between observation updates: tp(t,x)=Ap(t,x),\partial_t p(t, x) = A^* p(t, x), for t(tk,tk+1]t \in (t_k, t_{k+1}], with instantaneous update at each arrival of an observation: p(tk+1,x)=p(tk+1,x)L(Ok+1x)dx,p(t_{k+1}^-, x) = \int p(t_{k+1}, x) L(O_{k+1}|x)\, dx, where AA^* is the adjoint of the generator Aφ=12Tr[a(x)D2φ]+μ(x)φA\varphi = \frac{1}{2}\operatorname{Tr}[a(x) D^2\varphi] + \mu(x)\cdot\nabla\varphi, with a=σσa = \sigma\sigma^\top, and L(ox)L(o|x) is the likelihood. In continuous observation settings, the Zakai equation can be written using Itô calculus as

dpt(x)=Apt(x)dt+pt(x)h(x)dYt.dp_t(x) = A^*p_t(x)\,dt + p_t(x)\,h(x)\cdot dY_t.

2. Nonlinear Feynman–Kac and BSDE Representation

To exploit probabilistic representations, the filtering problem is recast via the (nonlinear) Feynman–Kac formula over each prediction interval [tk,tk+1][t_k, t_{k+1}]. An auxiliary forward process is considered, independent of the observation path: dXs=μ(Xs)ds+σ(Xs)dWs,Xtk=x,dX_s = \mu(X_s)\,ds + \sigma(X_s)\,dW_s, \qquad X_{t_k} = x, where WsW_s is an independent Brownian motion. The terminal condition for the backward pass is defined recursively: gk(x,O1:k)=pk1(tk,x,O1:k1)L(Okx),g_k(x, O_{1:k}) = p_{k-1}(t_k, x, O_{1:k-1}) L(O_k|x), with g0(x)=π0(x)g_0(x) = \pi_0(x). The unnormalized density at tk+1t_{k+1} is obtained as

pk(tk+1,x,O1:k)=E[gk(Xtk+1)+tktk+1f(Xs,Ys,Zs)ds],p_k(t_{k+1}, x, O_{1:k}) = \mathbb{E}\left[ g_k(X_{t_{k+1}}) + \int_{t_k}^{t_{k+1}} f(X_s, Y_s, Z_s)\, ds \right],

where (Xs,Ys,Zs)(X_s, Y_s, Z_s) solve the uncoupled forward–backward SDE system for s[tk,tk+1]s\in [t_k, t_{k+1}]: dXs=μ(Xs)ds+σ(Xs)dWs,Xtk=x, Ys=gk(Xtk+1)+stk+1f(Xr,Yr,Zr)drstk+1ZrdWr.\begin{gathered} dX_s = \mu(X_s)\,ds + \sigma(X_s)\,dW_s,\,\, X_{t_k}=x, \ Y_s = g_k(X_{t_{k+1}}) + \int_s^{t_{k+1}} f(X_r, Y_r, Z_r) dr - \int_s^{t_{k+1}} Z_r\,dW_r. \end{gathered} To produce the unnormalized density at any tt in [tk,tk+1][t_k, t_{k+1}], YY is evaluated at the corresponding (reversed) time.

3. Deep BSDE Approximation and Neural Architecture

The backward SDE is discretized in time using a controlled process: Yn+1=Ynf(Xn,Yn,Zn)Δt+ZnΔWn,Y0=w(Xtk,O1:k),\mathcal{Y}_{n+1} = \mathcal{Y}_n - f(\mathcal{X}_n, \mathcal{Y}_n, \mathcal{Z}_n)\Delta t + \mathcal{Z}_n\Delta W_n,\quad \mathcal{Y}_0 = w(\mathcal{X}_{t_k}, O_{1:k}), where Xn\mathcal{X}_n is an Euler–Maruyama path and ΔWn\Delta W_n are independent Brownian increments. The solution is parameterized by neural networks:

  • wθ(x,O1:k)w^\theta(x, O_{1:k}) approximates YtkY_{t_k}
  • vnθ(x,O1:k)v_n^\theta(x, O_{1:k}) approximates ZZ at time step tk,nt_{k,n}

Training occurs via minimization of the empirical terminal loss over MM simulated trajectories: (θ)=1Mm=1MYN(m)gk(XN(m),O1:k(m))2,\ell(\theta) = \frac{1}{M} \sum_{m=1}^M \left| \mathcal{Y}_N^{(m)} - \overline g_k(\mathcal{X}_N^{(m)}, O_{1:k}^{(m)})\right|^2, where gk\overline g_k may be normalized or unnormalized at the terminal point.

The network design includes:

  • ww-network: fully connected, ReLU, 3 hidden layers of size 128, exponential output activation, input dimension d+(d×(k1))d + (d' \times(k-1))
  • vv-networks: one per time step, 3 hidden layers of size 32, linear output, same input size
  • Training: Adam optimizer, learning rate 10410^{-4}, batch size 512, up to 100 epochs with early stopping (patience 5 epochs), and parameter sharing across observation steps via zero-padding of unused observations

Normalization of densities is performed using quadrature (for d=1d=1, with J=103J=10^3 evaluation points on [5,5][-5,5]). The dominant training cost scales as K×N×MbatchesK \times N \times M_\text{batches}.

4. Error Analysis and Theoretical Bounds

Under smoothness and uniform ellipticity conditions, a mixed a priori–a posteriori error bound is established. Specifically, for the maximum deviation of the learned density: max1kKpk(tk)p^kC(τ1/2+j=0K1supx,OE[gj(XNj,x,O1:j)YNj,x2]1/2)\max_{1 \leq k \leq K} \|p_k(t_k) - \widehat p_k\|_{\infty} \leq C\left(\tau^{1/2} + \sum_{j=0}^{K-1} \sup_{x,O}\mathbb{E}\Big[|\overline g_j(\mathcal{X}_N^{j,x}, O_{1:j}) - \mathcal{Y}_N^{j,x}|^2\Big]^{1/2}\right) where τ=T/(KN)\tau = T/(KN). The error consists of an explicit time-discretization term O(τ1/2)O(\tau^{1/2}) and a residual a posteriori (learning) term reflecting empirical convergence.

5. Representative Numerical Experiments

Two test cases provide numerical validation of the approach: - Ornstein–Uhlenbeck process (linear): μ(x)=x\mu(x) = -x, σ(x)=1\sigma(x) = 1, h(x)=xh(x) = x, R=1R=1; reference solution by analytic Kalman–Bucy. With K=10K=10, T=1T=1, and N=2jN=2^{j} for j=06j=0\ldots6, the observed final-time error eKe_K and accumulated residual EE exhibit N1/2N^{-1/2} convergence, with uniform accuracy over observation steps. - Bistable drift: μ(x)=(2/5)(5xx3)\mu(x) = (2/5)(5x - x^3), σ=1\sigma = 1, h(x)=xh(x) = x; reference solution via 10510^5-particle bootstrap filter with KDE. Using the same TT, KK, and NN, eKe_K and EE again show N1/2N^{-1/2} decay up to N=16N=16, beyond which a plateau signals that the learning residual becomes dominant.

6. Practical Implementation Guidance

Adaptation to higher dimensions and different model classes follows several best practices:

  • Employ richer neural network architectures (e.g., time embeddings, UNets) for spatially high-dimensional xx.
  • Combine multilevel Monte Carlo (MLMC) strategies: begin with coarse time-steps (NN), then fine-tune on finer grids without reinitializing weights.
  • Randomize sampled time-steps during training to ensure robust performance for all t[tk,tk+1]t \in [t_k, t_{k+1}].
  • Use a sufficiently large number of Monte Carlo samples MM to reduce the a posteriori residual below the discretization error.
  • Normalize densities in higher dimensions with robust quadrature or importance sampling.

The interface from Zakai to BSDE formulation and then to neural approximation with sequential updates yields a Deep BSDE Filter achieving mesh-free O(Δt1/2)O(\Delta t^{1/2}) convergence rates in time and empirical consistency across multiple nonlinear filtering scenarios.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Deep BSDE Filter.