Distributionally Robust Kalman Filter

Updated 13 December 2025

DRKF is a state estimation framework that enhances classical Kalman filters by accounting for model uncertainties via ambiguity sets defined using Wasserstein balls.
It reformulates filtering as a minimax problem, leveraging convex semidefinite programming to derive robust gains under distributional mismatch.
Empirical studies show that DRKF significantly reduces mean-square error and LQR cost in both Gaussian and non-Gaussian noise settings with scalable real-time implementation.

A Distributionally Robust Kalman Filter (DRKF) generalizes the classical Kalman filter by optimizing state estimation under model uncertainty defined by ambiguity sets, most frequently specified via Wasserstein balls around nominal noise distributions. The recent DRKF literature formalizes and analyzes this approach, demonstrating robust performance and providing tractable implementations anchored in convex optimization, primarily semidefinite programming (SDP). This robustification is motivated by the empirical and theoretical degradation of Kalman filter performance under distributional mismatch, non-Gaussianity, or adversarial perturbations.

1. Model Formulation and Ambiguity Sets

The canonical setting is a discrete-time, linear state-space model: $x_{k+1} = A x_k + w_k,\quad y_k = C x_k + v_k$ where $x_k \in \mathbb{R}^{n_x}$ , $y_k \in \mathbb{R}^{n_y}$ , and the process/measurement noise $w_k, v_k$ are unknown, assumed only to satisfy distributional constraints. Classical Kalman filtering assumes known, fixed Gaussian noise; the DRKF instead posits that the conditional laws of $w_k$ and $v_k$ are only approximately Gaussian, belonging to 2-Wasserstein balls: $D_w = \big\{ P_w \in \mathcal{P}_2(\mathbb{R}^{n_x}) : W_2(P_w, \mathcal{N}(\hat{w}, \hat{\Sigma}_w)) \leq \epsilon_w \big\}$

$D_v = \big\{ P_v \in \mathcal{P}_2(\mathbb{R}^{n_y}) : W_2(P_v, \mathcal{N}(\hat{v}, \hat{\Sigma}_v)) \leq \epsilon_v \big\}$

The entire estimation pipeline is then recast as a minimax problem: $\min_{\phi} \max_{P_{w,k} \in D_w,\, P_{v,k} \in D_v} \mathbb{E}_{P}\left[ \|x_k - \phi(y_k)\|^2 \mid y_0,\ldots,y_{k-1} \right]$ where the supremum is over all noise laws within the specified ambiguity sets (Jang et al., 31 Mar 2025).

2. SDP-Based Characterization and Computation

A distinctive feature of DRKF theory is the reduction of the robust MMSE estimator to a single finite-dimensional convex SDP, enabled by exploiting Gaussianity (through Gelbrich’s formula) to confine the worst-case to mean/covariance perturbations: $\begin{aligned} &\max_{\Sigma_x, \Sigma_v, Y, Z} \ \mathrm{Tr}(\Sigma_x) \ \text{subject to:}\quad &\left[\begin{matrix} \Sigma_x - \Sigma_x^{\text{nom}} & \Sigma_x C^T \ C \Sigma_x & C\Sigma_x C^T + \Sigma_v \end{matrix}\right]\succeq 0 \ &\left[\begin{matrix} \Sigma_x^{\text{nom}} & Y \ Y^T & \Sigma_x \end{matrix}\right]\succeq 0,\quad \left[\begin{matrix} \Sigma_v^{\text{nom}} & Z \ Z^T & \Sigma_v \end{matrix}\right]\succeq 0 \ &\mathrm{Tr}[\Sigma_x + \Sigma_x^{\text{nom}} - 2Y] \leq \epsilon_w^2 \ &\mathrm{Tr}[\Sigma_v + \Sigma_v^{\text{nom}} - 2Z] \leq \epsilon_v^2 \end{aligned}$ Solving this SDP yields least-favorable covariances $(\Sigma_x^*, \Sigma_v^*)$ , and the robust steady-state Kalman gain: $K_{DR} = \Sigma_x^*\,C^T\,\left(C\,\Sigma_x^* C^T + \Sigma_v^*\right)^{-1}$ This analytic structure guarantees that the DRKF maintains identical online complexity per time step (matrixmultiplications/inversions) as the standard steady-state Kalman filter, with the only additional cost being the offline SDP solve (Jang et al., 31 Mar 2025).

3. Theoretical Guarantees and Convergence

Rigorous analysis provides explicit contraction and stability conditions for the DR Riccati iteration: $\Sigma_x^{+} = A \left[ \Sigma_x^{-1} + C^T (\Sigma_v^*)^{-1} C \right]^{-1} A^T + \hat{\Sigma}_w$ The mapping is a contraction on the cone of positive-definite matrices if $\epsilon_v$ is zero (or small) and $\epsilon_w$ is bounded by

$\theta_{\max} = \sqrt{\frac{\mathrm{Tr}(\bar{\Sigma}_{x,q}^-)}{1 - \phi_N\,\lambda_{\max}(\bar{\Sigma}_{x,q}^-)}} - \sqrt{\mathrm{Tr}(\bar{\Sigma}_{x,q}^-)}$

with spectral parameters $\phi_N$ defined via Hankel–Gramian matrices and $N \geq n_x$ . For any $\epsilon_w \leq \theta_{\max}$ and sufficiently small $\epsilon_v$ , the time-varying DR gain sequence converges as $k \to \infty$ to the computed $K_{DR}$ , ensuring robustness and stability (Jang et al., 31 Mar 2025).

4. Comparison to Risk-Sensitive and Divergence-Based Robust Filtering

Traditional risk-sensitive filters (e.g., those minimizing worst-case cost under KL-divergence ambiguity) can be interpreted as special cases of the DRKF paradigm. The risk-sensitive parameter relates directly to the Lagrange multiplier in the divergence-constrained robustification (Zorzi, 2015). The DRKF's Wasserstein formulation generalizes prior approaches by allowing non-Gaussian uncertainty (including mean shifts and covariance perturbations) and is less conservative than approaches restricted to moment or KL-ambiguity sets, while maintaining efficient SDP-based computation.

The DRKF can be distinguished from methods based on bicausal optimal transport (Han, 2023), marginal moment constraints (Chen et al., 6 Jul 2024), and tau-divergence balls (Zorzi, 2015). Each ambiguity set implies different worst-case distributions and practical trade-offs between computational tractability, conservatism, and the types of model mismatch against which the filter is robust.

5. Empirical Performance and Implementation

Numerical benchmarks confirm that the steady-state DRKF yields consistently lower mean-square error (MSE) and better LQR cost relative to baseline filters, both under correctly specified Gaussian noise and under significant non-Gaussian (e.g., U-Quadratic) model mis-specification. In a 2D tracking problem ( $n_x = 4$ , $n_y = 2$ ), the DRKF achieved a reduction in average MSE from ≈1.4 (standard KF) to ≈0.19 and LQR cost by over 80% under Gaussian noise; in non-Gaussian settings, DRKF outperformed both baseline and recent distributional-robust variants, attaining MSE ≈0.11 for U-Quadratic disturbances, while competitors' errors remained between ≈0.6 and >1.0 (Jang et al., 31 Mar 2025).

Workflow and computational complexity are as follows:

Stage	Operation	Complexity
Offline	Solve single SDP, size $O(n_x^2 + n_y^2)$	$O((n_x+n_y)^6)$
Online	Linear-time update as in standard KF	$O(n_x^3 + n_x^2n_y)$

The DRKF exhibits negligible additional online cost. Tuning $\epsilon_w$ and $\epsilon_v$ offers a systematic robustness/efficiency tradeoff.

6. Broader Context and Extensions

The DRKF admits generalizations to time-varying systems, degenerate or singular noise (as in Riccati-like formulations for degenerate densities (Yi et al., 2021)), distributed sensor networks (via consensus/diffusion variants (Zorzi, 2019)), and adversarial/contamination settings where a nonzero fraction of measurements may be arbitrarily perturbed (Chen et al., 2021, Ruckdeschel et al., 2012). The Wasserstein-ambiguity DRKF establishes a minimax-optimal saddle-point: the estimator minimizes the worst-case mean squared error, and the adversary selects the least favorable Gaussian noise within the ambiguity set (Shafieezadeh-Abadeh et al., 2018). Infinite-horizon and frequency-domain characterizations further quantify the gap to rational (finite-state) realizations (Kargin et al., 26 Jul 2024).

Related moment-constrained or marginal-ambiguity approaches (e.g., MC-MDRKF (Chen et al., 6 Jul 2024)) address specific uncertainties, such as unknown cross-sensor noise correlation, but under more restrictive ambiguity sets.

7. Significance and Practical Impact

The DRKF provides a scalable, theoretically justified, and practically tractable means of robustifying state estimation against distributional uncertainty in stochastic linear dynamical systems (Jang et al., 31 Mar 2025). It bridges rigorous minimax-optimality guarantees with the computational requirements of real-time control and signal processing, offering enhanced robustness over both classical and earlier robust Kalman filtering frameworks. Empirical studies indicate substantial MSE and cost reductions across broad regimes of mismatch, without loss of efficiency in the nominal scenario. The decoupling of robustness tuning (via $\epsilon_w, \epsilon_v$ ) from filter structure and the reduction to a single offline SDP for steady state are critical for practical deployment in large-scale or safety-critical systems.