- The paper presents KE, which jointly optimizes noise covariances and interpretable nonlinear update structures to bridge the gap in affine Kalman filtering.
- It demonstrates significant RMSE reductions on Doppler radar, LiDAR localization, and pedestrian tracking benchmarks, highlighting improved robustness.
- KE retains the recursive modular structure of the Kalman Filter while incorporating LLM-guided algorithm discovery without added runtime complexity.
Kalman Evolve: Interpretable Algorithm Discovery for Robust Kalman Filtering
Structural Suboptimality of Classical Kalman Filtering
The classical Kalman Filter provides MMSE state estimation for systems with linear dynamics and Gaussian noise, relying on affine update steps parameterized by process and measurement covariances Q, R. However, in many real-world domains—such as Doppler radar and LiDAR-based localization—the observation model is inherently nonlinear. Theoretical analyses demonstrate that, even under idealized settings with Gaussian priors, nonlinear sensing induces posteriors whose conditional means are fundamentally non-affine, implying a persistent gap between the optimal estimator and any affine Kalman-style update (see proofs and corollaries in the appendix). This gap cannot be eliminated through parameter optimization; it reflects a structural deficiency. Nonlinear dependencies (e.g., radial and angular measurements) render the minimum MSE estimator intrinsically non-affine, motivating the need for update mechanisms beyond classical affine transformations.

Figure 1: State estimation (left), next-state prediction (right), highlighting distinct operational paradigms and inference goals.
Joint Optimization of Parameters and Update Structure
The proposed framework, Kalman Evolve (KE), addresses the structural limitations of affine Kalman updates by integrating LLM-assisted program-space exploration with conventional noise parameter estimation. KE operates in two stages: first, it calibrates the noise covariances (Q, R) via least-squares or Optimized Kalman Filter (OKF); then, it employs an LLM-guided evolutionary search over interpretable update structures. LLMs serve as structured priors, generating syntactically valid and semantically plausible algorithm variants that maintain the recursive, modular organization of the Kalman Filter, while introducing non-affine corrections (e.g., gating, nonlinear residual transformations). This approach unifies data-driven parameter calibration and structural learning, enabling discovery and refinement of robust, interpretable filters tailored to empirical sensor models.
Figure 2: Overview of the Kalman Evolve framework, illustrating database initialization, LLM-driven mutation, multi-island search, and iterative fitness-based selection.
Empirical Evaluation: Synthetic and Real-World Benchmarks
Across synthetic and real-world tracking tasks—including Doppler radar target tracking, LiDAR vehicle localization, and pedestrian trajectory prediction on MOT20—the discovered KE algorithms consistently outperform strong baselines such as OKF. On Doppler benchmarks, KE yields up to 12% reduction in RMSE in the most challenging (Free) regime. Gains increase with complexity, indicating enhanced robustness to model mismatch and nonlinear dynamics. LiDAR-based localization on both simulated and NCLT datasets shows consistent RMSE reductions (5–6% synthetic, 2–3% real-world), even with limited data. In pedestrian NSP tasks, KE achieves an 8% reduction in test RMSE over OKF, with high statistical significance (p-value <10−6).
Figure 3: Visualization of two test trajectories from MOT20, comparing KF, OKF, and KE predictions; KE exhibits consistent improvements.
Figure 4: Next-state prediction performance and z-test quantile plots; KE yields significant reduction in MSE, especially at high quantiles.
The scaling behavior of KE is favorable: performance remains robust as a function of training samples, demonstrating strong data efficiency and stability, while OKF plateaus and KF provides the weakest results.
Figure 5: RMSE as a function of training samples for pedestrian tracking; KE outperforms across all dataset sizes.
Interpretability, Computational Complexity, and Structure
KE preserves the canonical structure of the Kalman Filter—predict-update recursion, use of innovation and gain matrices—while introducing interpretable nonlinear transformations. Typical KE updates employ scalar gates, elementwise nonlinearities, adaptive covariance rescaling, and robust innovation processing, all implemented in transparent code.



Figure 6: Examples of real and synthetic LiDAR trajectories; KE demonstrates robust tracking across regimes.
Figure 7: Statistical tests for LiDAR benchmarks, showing KE's superior performance distribution and tail robustness.
Inference complexity is comparable to classical Kalman filtering, with additional operations that are negligible at deployment. The main computational overhead is incurred during discovery, which is isolated from runtime operations.
Generalization, Data Efficiency, and Robustness
KE demonstrates strong cross-domain generalization: update rules discovered in complex benchmarks transfer with minimal degradation to simpler regimes, and vice versa. Robustness is confirmed by out-of-distribution studies and quantitative scaling analyses.

Figure 8: Relative RMSE of KE/OKF across benchmarks; values below 1 indicate improved generalization.
Figure 9: RMSE as a function of training samples for Doppler radar benchmarks; KE achieves rapid convergence and low-data regime efficacy.
Limitations and Future Directions
Although KE consistently achieves significant improvements, the discovered algorithms are not theoretical optima; further gains are attainable with deeper priors or advanced evolutionary strategies. Interpretability, while preserved at the algorithmic level, depends on the LLM prompt and mutation policy. There is scope for investigating prompt engineering, hybrid initialization strategies, and expanding discovery to broader classes of signal processing algorithms (e.g., wavelet transforms, brain-machine interface decoding). The current implementation leverages multi-island search and prompt concatenation, but may be further refined for stability and exploration.
Conclusion
Kalman Evolve offers a principled, interpretable framework for closing the structural performance gap in Kalman Filtering under realistic, nonlinear sensing models. By jointly optimizing noise parameters and update structure via LLM-guided program search, KE delivers consistent improvements across challenging benchmarks without sacrificing computational efficiency or interpretability. The separation between discovery and deployment makes KE practical for real-time applications. Theoretical analyses and empirical results affirm the necessity of structural optimization beyond parameter tuning. Future work will extend KE to more complex domains, explore structured priors, and scale LLM-driven algorithmic discovery for broader classes of recursive estimators.
References
See (2605.26830) for detailed theoretical proofs, algorithm listings, and supplementary experiments.