Deep BSDE Filter
- Deep BSDE Filter is an approximate Bayesian nonlinear filtering method that reformulates the filtering problem using backward stochastic differential equations and deep neural networks.
- It employs a nonlinear Feynman–Kac representation to derive rigorous error bounds and achieves O(Δt^(1/2)) convergence through controlled time discretization.
- Practical implementations on test cases like the Ornstein–Uhlenbeck process and bistable drift models demonstrate mesh-free performance and rapid online inference.
The Deep BSDE Filter is an approximate Bayesian nonlinear filtering method based on backward stochastic differential equations (BSDEs). It reframes the evolution of conditional filtering densities in terms of a nonlinear Feynman–Kac representation and leverages deep learning—specifically, neural networks trained with deep BSDE approaches—for approximating these densities. The core advantages include the use of offline training for rapid online inference, preservation of a rigorous error bound, and the potential to remain mesh-free in higher dimensions.
1. Nonlinear Filtering and the Zakai Equation
Nonlinear filtering concerns estimating the conditional probability density of a hidden signal that evolves according to a stochastic differential equation (SDE): where is a Brownian motion, and are the drift and diffusion coefficients, and is the initial law. Observations are received at discrete times : with independent Gaussian noise. The unnormalized conditional density satisfies the Zakai equation between observation updates: for , with instantaneous update at each arrival of an observation: where is the adjoint of the generator , with , and is the likelihood. In continuous observation settings, the Zakai equation can be written using Itô calculus as
2. Nonlinear Feynman–Kac and BSDE Representation
To exploit probabilistic representations, the filtering problem is recast via the (nonlinear) Feynman–Kac formula over each prediction interval . An auxiliary forward process is considered, independent of the observation path: where is an independent Brownian motion. The terminal condition for the backward pass is defined recursively: with . The unnormalized density at is obtained as
where solve the uncoupled forward–backward SDE system for : To produce the unnormalized density at any in , is evaluated at the corresponding (reversed) time.
3. Deep BSDE Approximation and Neural Architecture
The backward SDE is discretized in time using a controlled process: where is an Euler–Maruyama path and are independent Brownian increments. The solution is parameterized by neural networks:
- approximates
- approximates at time step
Training occurs via minimization of the empirical terminal loss over simulated trajectories: where may be normalized or unnormalized at the terminal point.
The network design includes:
- -network: fully connected, ReLU, 3 hidden layers of size 128, exponential output activation, input dimension
- -networks: one per time step, 3 hidden layers of size 32, linear output, same input size
- Training: Adam optimizer, learning rate , batch size 512, up to 100 epochs with early stopping (patience 5 epochs), and parameter sharing across observation steps via zero-padding of unused observations
Normalization of densities is performed using quadrature (for , with evaluation points on ). The dominant training cost scales as .
4. Error Analysis and Theoretical Bounds
Under smoothness and uniform ellipticity conditions, a mixed a priori–a posteriori error bound is established. Specifically, for the maximum deviation of the learned density: where . The error consists of an explicit time-discretization term and a residual a posteriori (learning) term reflecting empirical convergence.
5. Representative Numerical Experiments
Two test cases provide numerical validation of the approach: - Ornstein–Uhlenbeck process (linear): , , , ; reference solution by analytic Kalman–Bucy. With , , and for , the observed final-time error and accumulated residual exhibit convergence, with uniform accuracy over observation steps. - Bistable drift: , , ; reference solution via -particle bootstrap filter with KDE. Using the same , , and , and again show decay up to , beyond which a plateau signals that the learning residual becomes dominant.
6. Practical Implementation Guidance
Adaptation to higher dimensions and different model classes follows several best practices:
- Employ richer neural network architectures (e.g., time embeddings, UNets) for spatially high-dimensional .
- Combine multilevel Monte Carlo (MLMC) strategies: begin with coarse time-steps (), then fine-tune on finer grids without reinitializing weights.
- Randomize sampled time-steps during training to ensure robust performance for all .
- Use a sufficiently large number of Monte Carlo samples to reduce the a posteriori residual below the discretization error.
- Normalize densities in higher dimensions with robust quadrature or importance sampling.
The interface from Zakai to BSDE formulation and then to neural approximation with sequential updates yields a Deep BSDE Filter achieving mesh-free convergence rates in time and empirical consistency across multiple nonlinear filtering scenarios.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free