Papers
Topics
Authors
Recent
2000 character limit reached

Deep Splitting Filter: Neural SPDE Solver

Updated 17 November 2025
  • Deep Splitting Filter are advanced numerical algorithms that combine operator splitting with neural network parameterization to solve SPDE-based nonlinear filtering problems.
  • They decompose the filtering evolution into manageable prediction–update steps, enabling fast, online inference even in high-dimensional systems.
  • The method mitigates the curse of dimensionality by approximating conditional densities via Monte Carlo sampling with controlled error rates.

A Deep Splitting Filter refers to a class of numerical and data-driven algorithms for nonlinear state estimation (Bayesian filtering) in continuous-time dynamical systems, which leverage operator splitting and deep neural network parameterizations to approximate the evolution of conditional probability densities governed by stochastic partial differential equations (SPDEs), primarily the Fokker–Planck or Zakai equation. Deep splitting filters—sometimes described as “mesh-free” neural SPDE solvers—are designed to mitigate the curse of dimensionality in classical particle and grid-based filtering schemes, enable fast online inference after offline training, and provide theoretically controlled error rates under broad regularity and Hörmander-type conditions.

1. Mathematical Foundations: SPDE-Based Filtering

At the core of nonlinear filtering is the evolution of the signal process StRdS_t \in \mathbb{R}^d governed by an SDE

dSt=μ(St)dt+σ(St)dBt,S0q0(x)dx,dS_t = \mu(S_t)\,dt + \sigma(S_t)\,dB_t, \qquad S_0 \sim q_0(x)\,dx,

and discrete or continuous noisy observations YkY_k satisfying

YkStk=xN(h(x),R).Y_k | S_{t_k} = x \sim N(h(x), R).

The conditional density p(t,xY0:k)p(t,x \mid Y_{0:k}) evolves according to the forward Fokker–Planck (Kolmogorov) equation between measurements,

tpk=Apk,\partial_t p_k = A^* p_k,

where

Aϕ(x)=12i,jij(aij(x)ϕ(x))ii(μi(x)ϕ(x)),a=σσ,A^*\phi(x) = \frac{1}{2} \sum_{i,j} \partial_{ij}(a_{ij}(x)\,\phi(x)) - \sum_i \partial_i(\mu_i(x)\,\phi(x)), \quad a = \sigma\sigma^\top,

and is updated at observation times by Bayes’ rule: pk(tk,x)pk1(tk,x)L(Yk,x),L(y,x)=exp(12R1/2(yh(x))2).p_k(t_k,x) \leftarrow p_{k-1}(t_k,x)\,L(Y_k,x), \quad L(y,x) = \exp\left(-\frac{1}{2}\|R^{-1/2}(y-h(x))\|^2\right). For continuous-observation models, the Zakai SPDE arises: dpt(x)=Apt(x)dt+pt(x)h(x)dYt.dp_t(x) = A^* p_t(x)\,dt + p_t(x) h(x)^\top dY_t. This setting includes classical benchmarks such as the Benes filter and covers general nonlinear filtering problems (Bågmark et al., 22 Sep 2024, Lobbe, 2022, Bågmark et al., 2022).

2. Deep Splitting Scheme: Operator Splitting and Neural Parameterization

The deep splitting methodology decomposes the SPDE generator into tractable sub-operators. One canonical approach is to split A=A+FA^* = A + F, with AA the drift-diffusion part and FF the first-order remainder encompassing variable coefficients and nonlinearity. The discrete-time prediction step leverages the Feynman–Kac formula over a timestep τ\tau: pk(tn+1,x)=E[pk(tn,Xtn+1tn,x)+τFpk(tn,Xtn+1tn,x)Y0:k],p_k(t_{n+1},x) = \mathbb{E}\left[ p_k(t_n, X_{t_{n+1}}^{t_n,x}) + \tau F p_k(t_n, X_{t_{n+1}}^{t_n,x}) \mid Y_{0:k} \right], where Xtn+1tn,xX_{t_{n+1}}^{t_n,x} denotes the solution of the SDE started at xx. This expectation is approximated by Monte Carlo samples and parameterized via a deep neural network NNθ\mathbb{NN}_\theta: θn+1=argminθ1Mm=1MNNθ(ZNn1m,Y0:km)Gpk,n(ZNnm,Y0:km)2,\theta^*_{n+1} = \arg\min_{\theta} \frac{1}{M} \sum_{m=1}^M |\mathbb{NN}_\theta(Z_{N-n-1}^m, Y_{0:k}^m) - G p_{k,n}(Z_{N-n}^m, Y_{0:k}^m)|^2, with Gϕ=ϕ+τFϕG\phi = \phi + \tau F \phi. Energy-based outputs efθ(x,y0:k)e^{-f_\theta(x, y_{0:k})} enforce positivity and facilitate normalization. This paradigm generalizes naturally for Zakai-type SPDEs and permits recursive, sample-based training without spatial grids.

3. Prediction–Update Structure and Online Recursion

Deep splitting filters implement a two-stage prediction–correction architecture. The offline-trained neural networks advance the density in time via the split propagator (prediction step), simulating stochastic paths, and the exact Bayes multiplication realizes the update at data arrival. The normalization can be performed recursively: pn(x)=exp{h(x)TΔYn12h(x)2Δt}p~n(x)Cn,p^n(x) = \frac{\exp\{h(x)^T \Delta Y_n - \frac{1}{2}\|h(x)\|^2\Delta t\} \tilde{p}^n(x)}{C_n}, with normalization constant CnC_n evaluated by Monte Carlo or quadrature. Notably, once trained, the filter computes instantaneous conditional densities and moments for arbitrary fresh observation paths, with no retraining required for new data (Bågmark et al., 22 Sep 2024, Lobbe, 2022, Bågmark et al., 2022).

4. Model Architectures, Training Procedures, and Domain Adaptation

Typical implementations use fully connected ReLU or tanh neural networks of moderate depth (e.g., 3–4 hidden layers, 64–128 units per layer) for density parameterization. Loss functions minimize SPDE residuals or mean-square errors between network outputs and splitting targets, using large batches of MC-sampled paths. In nonlinear/multimodal regimes, domain adaptation is crucial: at each prediction-update cycle, the spatial support of the network is recentered/rescaled according to means and variances of the posterior, ensuring mass coverage and mitigating drift—this is especially relevant for Benes-like models with highly nonstationary filtering densities. Monte Carlo sampling, automatic differentiation, and optimizers such as Adam are routinely employed (Lobbe, 2022).

5. Convergence Theory and Error Analysis

Under strong regularity and the parabolic Hörmander condition on the vector fields V0,VjV_0, V_j, deep splitting filters satisfy strong O(τ1/2)O(\tau^{1/2}) global convergence in L2(Ω;L)L^2(\Omega; L^\infty) for the density approximation, with local errors O(τ)O(\tau)—proved using stochastic integration by parts and Malliavin calculus to control sample pathwise error propagation. The central-limit theorem and unbiased MC estimators ensure variance scaling as O(1/M)O(1/M) per substep. Empirical results confirm that error decays as N1/2N^{-1/2} for increasing temporal refinement in Ornstein–Uhlenbeck and bistable drift examples (Bågmark et al., 22 Sep 2024).

6. Computational Performance and Numerical Results

Benchmarks demonstrate the efficacy of deep splitting filters in low and moderate dimensions. For 1D Ornstein–Uhlenbeck and nonlinear bistable SDEs over typical time horizons with K=20K=20 updates, networks trained offline (~2 hrs on RTX 3080 GPU per example) yield online inference at O(KNLJ)10O(KNLJ) \sim 10 ms per trajectory. Evaluated error metrics include posterior mean error, L2L^2 density error to Monte Carlo references, probability mass retention, and normalization acceptance rates. Adaptivity improves posterior tracking and mitigates boundary loss; increasing network width/depth reduces error but raises sample requirements and runtime (Bågmark et al., 22 Sep 2024, Lobbe, 2022, Bågmark et al., 2022).

Performance table for a prototypical setting:

Model MAE vs True L2L^2-Density Error Training Time/Step
OU (1D) \textless0.02 <0.01<0.01 ~2 hrs
Bistable (1D) \textless0.05 <0.05<0.05 ~2 hrs
Linear (20D) Comparable PF (1000 particles), per-step inference O(10ms)O(10\,ms) Amortized cost independent of dd 16 hrs

A plausible implication is that deep splitting filters can offer accuracy competitive with bootstrap particle filters while being computationally efficient, especially for high-dimensional problems where PF scales poorly.

7. Extensions, Generalizations, and Limitations

The deep splitting approach admits various generalizations: it extends to any filtering problem where the Zakai (or Fokker–Planck) equation applies and can accelerate mesh-free filtering in high-dimensional nonlinear systems such as atmospheric models. Further developments in energy-based parameterizations, higher-order splitting, adaptive time grids, and alternative training objectives (e.g., reverse KLD, noise-contrastive estimation) are feasible. Limitations include potential error accumulation in highly multimodal/posterior drift regimes, the need for tailor-made tail layers to ensure integrability, and the absence of rigorous convergence proofs for all variants outside core splitting steps. Training complexity increases for longer time horizons and larger state spaces, necessitating advances in scalable neural architectures and splitting schemes (Lobbe, 2022, Bågmark et al., 2022).

In summary, deep splitting filters constitute a rigorous, operator-theoretic framework for data-driven Bayesian filtering, combining SPDE splitting, neural approximation, recursive normalization, and theoretically controlled error—enabling scalable online inference in nonlinear, high-dimensional dynamical systems.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Deep Splitting Filter.