Differentiable Bayesian Filters

Updated 26 April 2026

Differentiable Bayesian Filters are state estimation frameworks that combine recursive Bayesian filtering with neural network-based learnable components for end-to-end training.
They use differentiable models such as normalizing flows and diffusion processes to handle nonlinearities and multimodal posterior distributions in high-dimensional spaces.
Applications in robotic perception, visual odometry, and sensor fusion demonstrate improved efficiency, adaptability, and robust uncertainty modeling over traditional filters.

A differentiable Bayesian filter is a state estimation framework that combines the recursive structure of traditional Bayesian filtering with the function approximation and learning capabilities of deep neural networks. By parameterizing components such as the transition model, measurement model, and sometimes even the filter update rule as differentiable operators, one obtains a fully trainable module where gradients can flow through all inference steps. This paradigm has enabled robust, data-driven state estimation for highly nonlinear, high-dimensional, and partially observed systems, especially in robotic perception, visual odometry, and sensor fusion tasks. Differentiable Bayesian filters include neuralizations and generalizations of Kalman filters, particle filters, and, more recently, state-update rules based on modern deep generative models such as normalizing flows and diffusion models.

1. Bayesian Filtering: Classical Structure and Differentiability

Classical Bayesian filtering over hidden state $x_t$ and observation $o_t$ is defined by two recursive equations:

Prediction: $p(x_t | o_{1:t-1}) = \int p(x_t | x_{t-1})\, p(x_{t-1} | o_{1:t-1}) dx_{t-1}$
Update: $p(x_t | o_{1:t}) \propto p(o_t | x_t)\, p(x_t | o_{1:t-1})$

Traditional filters (e.g., Kalman, particle filter) require explicit model structures and hand-crafted noise models. Differentiable Bayesian filters replace some or all components with learnable, differentiable functions—typically neural networks—allowing the end-to-end optimization of process and measurement models, uncertainty representations, and even the update/fusion logic (Kloss et al., 2020, Lee et al., 2020).

For high-dimensional or nonlinear cases where the analytical calculation is intractable, neural networks and modern deep generative models offer powerful alternatives for belief representation and update.

2. Algorithmic Instances: Neuralized and Flow-/Diffusion-Based Filters

Within the differentiable Bayesian filter paradigm, several algorithmic classes stand out:

Filter Class	Core Update Mechanism	Differentiable Elements
Differentiable Kalman	Analytic Gaussian belief + NN models	Dynamics/measurement/uncertainty networks
Differentiable Particle	Monte Carlo, weighted/resampled particles	Dynamic/model/proposal/likelihood NN, soft resample
Diffusion-based (DnD)	Denoising diffusion process for state update	All steps: process, measurement, denoising network
Flow-based (FBF, CNF-DPF)	Normalizing flows for belief representation	Flow, proposal, measurement, resampling
Neural Bayesian (NBF)	Embedding of belief, decoded by conditional flow	All steps: embedding, flow, transition
MAP optimization (IMAP)	Unrolled optimizer on negative log posterior	Optimizer, process/loss computation

Kalman and UKF variants: Extended, unscented, and Monte Carlo implementations employ differentiable neural models for process and sensor structures. Key learnable parameters include the transition and measurement function, as well as heteroscedastic uncertainty representations (Kloss et al., 2020, Lee et al., 2020).

Particle Filters: DPFs maintain a set of weighted samples (particles) and backpropagate gradients through differentiable proposal, weighting, and update steps; the resampling operation is typically relaxed via soft resampling, optimal transport, or neural barycentric resamplers to allow gradient flow (Chen et al., 2023, Jonschkowski et al., 2018).

Flow-based Filters: These filters replace explicit parametric state distributions with normalizing flows (e.g., RealNVP) to achieve arbitrary posterior expressivity. The entire filtering recursion is implemented in a learned latent space (e.g., FBF: latent linear-Gaussian updates, invertibly mapped to state-space via learned flows) (Wang et al., 22 Feb 2025). Conditional flow proposals, as in CNF-DPF, substantially improve sample efficiency by adapting the proposal to the most recent observations (Chen et al., 2021).

Diffusion Models (DnD Filter): The DnD Filter employs a denoising diffusion model in place of the analytic Bayesian update, allowing highly nonlinear and multimodal posteriors without explicit Gaussian assumptions. The filter update is carried out by K-step learned denoising, conditioned jointly on the predicted state and current observation features (Wan et al., 3 Mar 2025).

Neural Bayesian Filtering: The NBF framework represents the belief as a learned fixed-length embedding ( $z_t$ ) obtained from a set of weighted particles. State samples are drawn from a conditional normalizing flow conditioned on $z_t$ (Solinas et al., 4 Oct 2025).

Implicit MAP Filtering: By recasting the Bayes update as K steps of (possibly adaptive) gradient descent on a time-varying loss, IMAP filtering sidesteps matrix algebra and is tractable for very high-dimensional cases (e.g., neural network adaptation) (Bencomo et al., 2023).

3. Training Methodologies and Loss Functions

Training of differentiable Bayesian filters is performed end-to-end, with supervision from ground-truth states (if available), or via unsupervised surrogate loss functions:

Supervised Losses: Common objectives include mean squared error (MSE) and negative log-likelihood (NLL) between the inferred and true states, accumulated over the trajectory (Kloss et al., 2020, Lee et al., 2020, Jonschkowski et al., 2018).
Unsupervised/Pseudo-likelihood: When no ground-truth is available, filtering evidence lower bounds (e.g., SMC-ELBO) and pseudo-likelihood terms are used to update model and proposal parameters online (Li et al., 2023, Chen et al., 2023).
Stepwise and Sequence Losses: Loss functions are applied per time-step and sometimes over sliding windows to encourage temporal consistency and effective error propagation through unrolling (Kloss et al., 2020, Lee et al., 2020, Wan et al., 3 Mar 2025).
Specialized Losses: Diffusion-based filters employ noise prediction losses at each denoising step, while flow-based approaches maximize change-of-variable likelihoods under the model (Wan et al., 3 Mar 2025, Wang et al., 22 Feb 2025, Solinas et al., 4 Oct 2025).

Gradient flow through the entire filter trajectory is achieved by careful parameterization and, where necessary, differentiable relaxations of non-smooth operations (e.g., resampling).

4. Empirical Performance and Applications

Differentiable Bayesian filters have demonstrated state-of-the-art performance in a variety of robotic and perception tasks:

DnD Filter achieves a 25% improvement in pixel-level odometry error on the real-world KITTI odometry dataset versus the best differentiable filters, and even outperforms differentiable smoothers using future measurements (Wan et al., 3 Mar 2025).
Flow-based Bayesian Filters scale accurately to state dimensions up to 100, outperforming PF, RKN, and CNF-DPF in training time, online filtering efficiency, and uncertainty modeling (Wang et al., 22 Feb 2025).
Differentiable Particle and Kalman Filters outperform LSTM and transformer baselines on position tracking, especially in cases of partial observability or multimodal posterior structure (Kloss et al., 2020, Lee et al., 2020, Jonschkowski et al., 2018).
Online learning DPFs adapt in real time to distribution shifts, converging in a few hundred steps, and reduce error by up to 30% relative to static pre-trained filters (Li et al., 2023).
Multimodal and attention-based filters, such as α-MDF, offer substantial gains in state estimation for soft robotics, with error reductions by up to 45% over standard differentiable filters (Liu et al., 2023).

These methods excel in high-dimensional, nonlinear, and heterogeneous sensory environments due to their ability to encode rich uncertainty models, fuse diverse modalities, and adaptively recalibrate internal models.

5. Strengths, Limitations, and Design Trade-offs

Strengths:

Arbitrary posterior expressivity, including multimodal and heavy-tailed distributions (especially with flows or diffusion models).
Complete end-to-end differentiability supports joint learning of transition, observation, proposal/fusion, and uncertainty representation.
Modular structure supports insertion of algorithmic priors (e.g., recursive Bayes, ensemble, or attention-based fusion) for increased interpretability and regularization (Jonschkowski et al., 2018).
Robustness under partial observation, sensor failure, and distribution shift—especially with online update algorithms (Li et al., 2023).

Limitations:

Computational cost can be significant, especially for diffusion-based updates (scaling with the number of denoising steps) or normalizing-flow Jacobian determinants (Wan et al., 3 Mar 2025, Wang et al., 22 Feb 2025).
Soft/differentiable resampling introduces bias or additional complexity, and exact resampling remains non-differentiable (Chen et al., 2023, Kloss et al., 2020).
Some approaches retain only point estimates or lack calibrated uncertainty (e.g., gradient-descent/MAP-based methods) (Bencomo et al., 2023).
Training instability may occur, motivating stagewise, curriculum, or hybrid training strategies.

Design trade-offs include selecting between expressive generative models (flows, diffusion) and computational tractability; balancing interpretability/algorithmic priors with pure data-driven learning; and choosing loss functions that promote calibration as well as prediction accuracy.

6. Recent Advances and Future Directions

Several recent trends define the evolving landscape of differentiable Bayesian filters:

Expressive posterior models: Adoption of diffusion models (DnD Filter) enables highly nonlinear state updates, dispensing with Gaussian assumptions (Wan et al., 3 Mar 2025).
Normalizing flow–based embeddings: Flow-based filters and NBF extend recursive Bayesian estimation to spaces where the filtering distribution is represented intractably but is compressible via invertible mappings (Wang et al., 22 Feb 2025, Solinas et al., 4 Oct 2025).
Online and continual learning: Recent work on unsupervised, on-the-fly parameter learning without ground-truth states addresses distributional shift and model deployment constraints (Li et al., 2023).
Attention and multimodal fusion: Filters with learned, attention-based gain mechanisms (e.g., α-MDF) enable context-aware sensor fusion beyond analytic Kalman gains, especially in soft-robot and cross-sensor tasks (Liu et al., 2023).
Integration of optimizer-based update rules: Implicit MAP filtering frames the update as inner-loop optimization, yielding scalable estimators for very high-dimensional parameter spaces (Bencomo et al., 2023).

Future directions include more efficient denoising/sampling for diffusion-based filters, robust continual adaptation strategies, tighter integration of probabilistic smoothing in differentiable frameworks, and richer multimodal fusion via transformer-style architectures. Incorporating loop closure, external constraints, or additional modalities (e.g., radar, LiDAR) into such models is an open technical challenge with significant impact for autonomous robotics and embodied perception.

References:

(Wan et al., 3 Mar 2025, Solinas et al., 4 Oct 2025, Wang et al., 22 Feb 2025, Kloss et al., 2020, Lee et al., 2020, Jonschkowski et al., 2018, Chen et al., 2021, Li et al., 2023, Chen et al., 2023, Bencomo et al., 2023, Liu et al., 2023)