Neural Network Aided Kalman Filtering

Updated 19 February 2026

Neural network aided Kalman filtering is a hybrid method that combines the classic prediction-update framework with neural network-based gain computation to manage partial model knowledge and nonlinearity.
It uses recurrent architectures like GRUs and LSTMs to replace analytic covariance recursions with learned mappings, enabling end-to-end adaptation and robust performance.
Empirical studies show improved state estimation in linear, nonlinear, and quantized systems, while challenges remain in uncertainty calibration and interpretability.

Neural network aided Kalman filtering refers to a family of state estimation algorithms that integrate neural network components—most commonly recurrent architectures such as GRUs or LSTMs—into the recursive structure of the classical (extended) Kalman filter. This approach is motivated by the proliferation of applications where system models are partially known, subject to complex nonlinearities, or where noise statistics are unknown, heavy-tailed, nonstationary, or otherwise poorly modeled. By replacing analytic operations such as gain or covariance computation with learned mappings trained from data, neural network aided filters aim to combine the interpretability, temporal structure, and data efficiency of model-based filtering with the robustness and flexibility of modern deep learning.

1. Core Principles of Neural Network Aided Kalman Filtering

The foundational principle underlying neural network aided Kalman filters is the preservation of the two-step prediction and update structure of the standard filter:

Prediction Step: Use (partially known) system dynamics $f(\cdot)$ , $h(\cdot)$ for state and observation prediction.
Update Step: Fuse each new observation $y_k$ via a learned correction, with the Kalman gain $K_k$ supplied by a small neural network module instead of analytic covariance recursions.

The KalmanNet architecture exemplifies this approach, outputting the gain $K_k(\theta)$ at each step from features derived from innovations and differences, where $\theta$ denotes the neural network parameters (Revach et al., 2021). The gain network is recursively updated and may subsume the role traditionally played by the covariance matrices in the classical Kalman filter, allowing the entire model to be trained end-to-end by minimizing, for example, the MSE of the state estimate.

This hybrid structure makes it distinct from both fully model-based filters (which require precise models and noise statistics) and fully data-driven RNNs (which lack structural constraints, interpretability, and require more data).

2. Model Architectures and Computational Workflow

Most neural network aided Kalman filters instantiate neural models at key nonlinear or uncertain computation points:

KalmanNet and Derivatives: The prototype pipeline, applicable in both linear and nonlinear settings, is as follows (Revach et al., 2021, Revach et al., 2021):
1. Prediction: $\hat{x}_{k|k-1} = f(\hat{x}_{k-1|k-1})$ ; $\hat{y}_{k|k-1} = h(\hat{x}_{k|k-1})$ .
2. Innovation: $\nu_k = y_k - \hat{y}_{k|k-1}$ .
3. Gain inference: $K_k = \mathrm{RNN}_\theta(\mathrm{features})$ (features often include $h(\cdot)$ 0 and state differences).
4. Update: $h(\cdot)$ 1.
Split-KalmanNet: Decomposes the gain computation into two parallel RNNs that separately estimate state-prediction and innovation covariances, then recompose $h(\cdot)$ 2, increasing robustness to different sources of model mismatch (Choi et al., 2022).
Adaptive KalmanNet: Uses a compact hypernetwork to modulate the parameters of the main gain-predicting network in response to real-time estimates of noise scale, achieving rapid adaptation without retraining (Ni et al., 2023).
GSP-KalmanNet: For graph-structured state spaces, the gain is computed in the graph frequency domain as a diagonal (filter-like) operator, and a deep network learns the graph-frequency-wise gain coefficients (Buchnik et al., 2023).
Recursive KalmanNet: Uses two dedicated RNNs: one for the gain and one for the error-covariance (Cholesky factor), recursively propagating uncertainty via the Joseph form and trained under a Gaussian negative log-likelihood loss to achieve calibrated uncertainties (Mortada et al., 13 Jun 2025).
Bussgang-aided KalmanNet: In the presence of extreme quantization (e.g., 1-bit ADCs), neural components learn a surrogate gain for the analytically intractable Bussgang-augmented update (Jung et al., 23 Jul 2025).

The feature design, hidden-state size, and exact network composition are domain and problem dependent, but in practice, complexity is dominated by small GRU/LSTM/FC modules tailored to the target state and observation dimensions.

3. Training Objectives and Uncertainty Quantification

Training is typically performed in a supervised or unsupervised fashion:

Supervised MSE Loss: Minimize $h(\cdot)$ 3, directly encouraging proximity to ground-truth states (Revach et al., 2021).
Unsupervised Prediction Loss: Exploit the model's internal observation prediction $h(\cdot)$ 4 to minimize $h(\cdot)$ 5 without labeled states, enabling purely measurement-driven adaptation (Revach et al., 2021).
Negative Log-Likelihood: For networks outputting both state and covariance, minimize $h(\cdot)$ 6 to force consistency between predicted and empirical uncertainties (Mortada et al., 13 Jun 2025, Dahan et al., 2023).

Uncertainty extraction is an essential objective:

Direct Inference of Covariance: Some variants estimate error covariance from the RNN's internal features, leveraging the learned gain and, where possible, explicit measurement models (Klein et al., 2021).
Bayesian Methods: Bayesian KalmanNet employs Monte Carlo dropout to approximate the posterior predictive distribution over state and covariance, yielding credible, well-calibrated uncertainties, especially under mismatch (Dahan et al., 2023).
Ensemble and Joseph Forms: Recursive KalmanNet combines neural and analytic covariance propagation to ensure that error estimates remain positive-definite and unbiased (Mortada et al., 13 Jun 2025).

4. Empirical Performance, Robustness, and Limitations

Empirical work has evaluated neural network aided Kalman filters on:

Linear/nonlinear systems: KalmanNet matches or exceeds the performance of optimal analytic KFs when models are known, and exhibits strong gains under moderate model mismatch or nonlinearity (Revach et al., 2021).
Chaotic/partially observed systems: KalmanNet and Latent-KalmanNet outperform KFs and particle filters on the Lorenz attractor and high-dimensional visual tracking by learning robust corrections and latent representations (Buchnik et al., 2023).
Real-world radar and vision data: In automotive radar tracking, KalmanNet achieves acceptable position accuracy but underperforms compared to Interacting Multiple Model (IMM) filters in terms of velocity/acceleration and especially uncertainty calibration, raising safety concerns (Mehrfard et al., 2024).
Non-Gaussian/quantized/heteroscedastic noise: Adaptive KalmanNet, RKN, BKNet, and Split-KalmanNet demonstrate strong robustness to time-varying noise, 1-bit quantization, and mixed distributions—scenarios where analytic filters degrade or fail (Ni et al., 2023, Mortada et al., 13 Jun 2025, Jung et al., 23 Jul 2025, Choi et al., 2022).
SLAM and multi-object tracking: Split-KalmanNet and SIKNet show that structured neural decompositions can enhance accuracy and stability in state-of-the-art SLAM and MOT pipelines (Choi et al., 2022, Song et al., 14 Sep 2025).

The main limitations include:

Lack of interpretability of learned gain matrices compared to analytic solutions, except when explicit invariants or constraints are enforced (Choi et al., 2022).
Vulnerability to out-of-distribution motions (e.g., figure-eight trajectories in radar data) and poorly calibrated uncertainty in some scenarios (Mehrfard et al., 2024).
Dependency on the representational and memory capacity of the employed neural modules, requiring careful feature and architecture selection.
Need for labeled states during supervised training, although unsupervised and federated approaches can help when only measurement sequences are available (Revach et al., 2021, Piperigkos et al., 2024).

5. Extensions and Specialized Variants

A variety of extensions have been developed:

Adaptive Gain Modulation: Hypernetwork-augmented variants adapt gain computation on-the-fly in response to ambient noise or external context, achieving “fine-tuning without retraining” (Ni et al., 2023).
Split Gain Learning: Robust to heterogeneous process and measurement noise and better suited to severe model mismatch or nonstationary observation channels (Choi et al., 2022, Song et al., 14 Sep 2025).
Graph-Structured Filtering: GSP-KalmanNet exploits graph spectral domains for scalable filtering in high-dimensional graph-structured dynamical systems, reducing complexity from $h(\cdot)$ 7 to $h(\cdot)$ 8 per step (Buchnik et al., 2023).
Distributed/Federated Training: FedKalmanNet employs federated averaging to train filters across multiple clients without sharing raw data, enabling privacy-preserving, collaborative state estimation in decentralized settings (Piperigkos et al., 2024).
Bussgang-Aided Filtering: Under extreme quantization, the Bussgang-KalmanNet architecture jointly learns dithering and gain, robustifying state estimation with 1-bit ADCs (Jung et al., 23 Jul 2025).
Latent-Domain Tracking: Latent-KalmanNet couples a learned encoder to the gain network, mapping high-dimensional visual or sensor measurements into a latent domain conducive to approximate linear filtering (Buchnik et al., 2023).
Task-driven Filtering: KalmanNet-augmented pairs trading merges unsupervised state tracking with task-specific fine-tuning (e.g., for financial PNL), demonstrating the utility of two-stage training for application-driven objectives (Milstein et al., 2022).

6. Summary Table of Representative Neural Kalman Filter Variants

Architecture	Key Neural Module	Domain/Target Problem	Uncertainty Output
KalmanNet (Revach et al., 2021)	RNN gain predictor	General state-space models	Possible from gain + features (Klein et al., 2021)
Recursive KalmanNet (Mortada et al., 13 Jun 2025)	Two RNNs (gain + covariance)	Consistent uncertainty	Explicit covariance
Adaptive KalmanNet (Ni et al., 2023)	Hypernetwork for modulation	Time-varying/process noise	Implicit
Split-KalmanNet (Choi et al., 2022)	Two RNNs (P, S^{-1})	SLAM, complex noise	No explicit covariance
Bayesian KalmanNet (Dahan et al., 2023)	Bayesian dropout ensemble	Uncertainty quantification	Posterior predictive
GSP-KalmanNet (Buchnik et al., 2023)	GFT-diagonal RNN gain	Graph signals, high-dim	No explicit covariance
Bussgang-KalmanNet (Jung et al., 23 Jul 2025)	GRU gain, Bussgang surrogate	1-bit quantized filtering	Explicit via network
Latent-KalmanNet (Buchnik et al., 2023)	Encoder + RNN gain	High-dim signals (vision)	No explicit covariance
SIKNet (Song et al., 14 Sep 2025)	SIE + split RNNs	Multi-object tracking (MOT)	No explicit covariance
FedKalmanNet (Piperigkos et al., 2024)	Federated RNN gain	Distributed localization	No explicit covariance

7. Research Impact, Controversies, and Outlook

Neural network aided Kalman filtering presents a paradigm for harnessing partial domain knowledge and data-driven adaptability within recursive estimation frameworks. The principal strengths are:

Ability to handle partial or mismatched dynamics and non-Gaussian noise.
Superior empirical error (MSE, mAR, RMSE) under moderate to severe mismatch, nonlinearity, or quantization effects.
End-to-end differentiable training suitable for integration into complex pipelines (SLAM, MOT, time series forecasting, trading).

However, consistent calibration of uncertainty, robustness to out-of-distribution sequences, and reliability in safety-critical settings remain open challenges, as evidenced by comparative studies in automotive radar (Mehrfard et al., 2024). While architectures such as Recursive KalmanNet and Bayesian KalmanNet have made strides in calibrated uncertainty quantification (Mortada et al., 13 Jun 2025, Dahan et al., 2023), further work is required to enable full interpretability and formal guarantees without reliance on labeled covariance data or explicit analytic expressions.

Potential future directions include incorporating richer Bayesian inference, combining with control or reinforcement learning objectives, further exploiting graph and manifold structures, expanding federated and privacy-preserving learning, and developing methods for joint adaptation in both state and noise/statistical parameter spaces.

References:

"KalmanNet: Neural Network Aided Kalman Filtering for Partially Known Dynamics" (Revach et al., 2021)
"Recursive KalmanNet: Deep Learning-Augmented Kalman Filtering for State Estimation with Consistent Uncertainty Quantification" (Mortada et al., 13 Jun 2025)
"Adaptive KalmanNet: Data-Driven Kalman Filter with Fast Adaptation" (Ni et al., 2023)
"Unsupervised Learned Kalman Filtering" (Revach et al., 2021)
"Split-KalmanNet: A Robust Model-Based Deep Learning Approach for SLAM" (Choi et al., 2022)
"Performance Evaluation of Deep Learning-Based State Estimation: A Comparative Study of KalmanNet" (Mehrfard et al., 2024)
"GSP-KalmanNet: Tracking Graph Signals via Neural-Aided Kalman Filtering" (Buchnik et al., 2023)
"Federated Data-Driven Kalman Filtering for State Estimation" (Piperigkos et al., 2024)
" $h(\cdot)$ 9VAE: A Koopman-Kalman Enhanced Variational AutoEncoder for Probabilistic Time Series Forecasting" (2505.23017)
"Motion Estimation for Multi-Object Tracking using KalmanNet with Semantic-Independent Encoding" (Song et al., 14 Sep 2025)
"Latent-KalmanNet: Learned Kalman Filtering for Tracking from High-Dimensional Signals" (Buchnik et al., 2023)
"Neural Augmented Kalman Filtering with Bollinger Bands for Pairs Trading" (Milstein et al., 2022)
"Uncertainty in Data-Driven Kalman Filtering for Partially Known State-Space Models" (Klein et al., 2021)
"State Estimation with 1-Bit Observations and Imperfect Models: Bussgang Meets Kalman in Neural Networks" (Jung et al., 23 Jul 2025)
"Bayesian KalmanNet: Quantifying Uncertainty in Deep Learning Augmented Kalman Filter" (Dahan et al., 2023)