Confidence-Weighted Extended Kalman Filter

Updated 1 December 2025

Confidence-Weighted EKF is a nonlinear estimation technique that integrates adaptive state covariance updates to quantify uncertainty in the presence of process and observation noise.
It leverages local linearization via Jacobians to propagate both mean estimates and confidence measures, making it efficient for applications like deep neural networks and sensor fusion.
Enhanced through learned calibration maps, the filter corrects overconfident covariance estimates to ensure robust online optimization and accurate uncertainty intervals.

A Confidence-Weighted Extended Kalman Filter (EKF) integrates explicit uncertainty quantification into nonlinear estimation, consistently adjusting confidence estimates throughout inference. In canonical EKF settings—including uncertainty propagation in deep neural networks, online stochastic optimization, and sensor fusion—such filters maintain a state covariance that encodes the algorithm’s local confidence, adapting this quantity through analytic models, data-driven calibration, or a combination of both. Confidence-weighted EKFs thus systematically account for process noise, observation noise, and model misspecification, providing both point estimates and credible covariance intervals at every inference step (Titensky et al., 2018, Tsuei et al., 2021, Vilmarest et al., 2020).

1. Mathematical Foundations of the EKF with Confidence Weighting

The EKF generalizes the linear Kalman filter to nonlinear dynamical systems and observations by locally linearizing the nonlinear mappings at each recursion. The mean and covariance updates propagate not just the expected state but also a covariance (the “confidence weight”) that encodes uncertainty about the estimate. The canonical discrete-time model comprises:

State propagation: $x_k = f(x_{k-1}, u_{k-1}) + \nu_k$ , with process noise $\nu_k \sim \mathcal{N}(0, R)$ .
Measurement update: $y_k = h(x_k) + w_k$ , with measurement noise $w_k \sim \mathcal{N}(0, Q)$ .

The EKF maintains estimates $\hat x_k$ (mean) and $\hat P_k$ (covariance). The confidence in $\hat x_k$ is reflected in the eigenstructure of $\hat P_k$ , which is recursively updated by projecting through the local Jacobians of $f$ and $h$ and by including noise covariances $R$ , $Q$ . The covariance update also acts as a per-dimension adaptive learning rate: low-variance dimensions (high-confidence) admit smaller corrections (Vilmarest et al., 2020).

2. Confidence-Weighted EKF in Deep Neural Networks

The methodology of (Titensky et al., 2018) recasts a feed-forward deep neural network (DNN) as a discrete-time nonlinear dynamical system, with each layer corresponding to a “time step” and each activation vector $x_\ell$ the “state.” Input uncertainty—assumed Gaussian with mean $\mu_0$ and covariance $\Sigma_0$ —is propagated through nonlinear layers via the following confidence-weighted EKF recursion:

Initialization: $P_0 = \Sigma_0$ (input uncertainty).
Prediction:
- $x_\ell = f(W_\ell x_{\ell-1} + b_\ell)$ , with $f$ the elementwise ReLU.
- $F_\ell = \partial x_\ell / \partial x_{\ell-1}$ , where $F_\ell(i, j) = W_\ell(i, j)$ if $z(i) = (W_\ell x_{\ell-1} + b_\ell)_i > 0$ , else $0$.
- $P_\ell = F_\ell P_{\ell-1} F_\ell^\top + Q_\ell$ .
Process noise $Q_\ell$ : Estimated as the empirical sample covariance of held-out layer activations, capturing model error (weights/bias uncertainty).

Only the input layer uses the measurement update; at all deeper layers, the update step is omitted ( $H_\ell=0$ for $\ell>0$ ), reducing the recursion to repeated prediction. The output $(x_L, P_L)$ gives an approximate Gaussian posterior (mean and covariance) over final DNN outputs (Titensky et al., 2018).

3. Systematic Covariance Calibration and Learned Confidence Weighting

Despite the formal covariance propagation of the EKF, empirical results demonstrate that EKF-predicted uncertainty is systematically miscalibrated—typically over-confident. In visual-inertial localization (Tsuei et al., 2021), miscalibration results from:

First-order linearization (neglecting higher-order Jacobian terms).
Static noise covariances ( $R$ , $Q$ ) that do not adapt to trajectory or state.
Non-Gaussianities in sensor noise and observation functions.

To correct this, (Tsuei et al., 2021) introduces a post-hoc learned calibration map $\phi$ applied to each $\hat P_k$ :

Simple scaling $P'_k = s \hat P_k$ .
Linear transformation $P'_k = A \hat P_k A^\top$ .
Neural networks mapping $\hat P_k$ (or $(\hat x_k, \hat P_k)$ ) to a lower-triangular matrix $Q_k$ , then setting $P'_k = Q_k Q_k^\top$ .

Calibration targets either Monte Carlo or locally ergodic estimates of ground-truth covariance, with loss given by squared error over upper-triangular $(i, j)$ entries, weighted to prioritize diagonals and main blocks. Replacing $\hat P_k \leftarrow P'_k$ in the EKF recursion empirically restores correct $\chi^2$ coverage, with neural network calibration substantially outperforming scalar or linear transforms (Tsuei et al., 2021).

4. Applications and Algorithmic Workflows

A. DNN Uncertainty Propagation

The confidence-weighted EKF algorithm for DNNs executes as follows:

Input: pretrained {W_ℓ, b_ℓ}, x₀ (mean), P₀ (cov), {Q_ℓ} (process noise)
for ℓ = 1,...,L:
    z = W_ℓ · x + b_ℓ
    x = ReLU(z)
    F_ℓ[i,j] = W_ℓ[i,j] if z[i]>0 else 0
    P = F_ℓ · P · F_ℓ.T + Q_ℓ
return x_L, P_L

This yields layerwise mean and covariance, propagating input uncertainty and incorporating layerwise model error.

B. Online Optimization via EKF Recursion

Confidence-weighted EKF is interpreted as a second-order online optimizer for generalized linear models. At each step:

Adapt learning rate and update direction using the current posterior covariance $P_t$ .
Update $P_{t+1}$ to reflect reduced uncertainty after observing a new data point.

This mechanism achieves per-coordinate learning rate adaptation and provides rigorous excess risk guarantees (Vilmarest et al., 2020).

C. Visual-Inertial Localization

The EKF is enhanced by learning a mapping from internal to calibrated covariance estimates, then using this mapping online in the EKF update, improving statistical calibration as measured by both empirical coverage and $\mathcal D_{L_2}$ divergence from the theoretical $\chi^2$ distribution (Tsuei et al., 2021).

5. Computational Tradeoffs and Performance

EKF-based confidence weighting achieves one forward pass and one Jacobian–covariance update per step or per DNN layer, scaling as $O(L n^2)$ per input in the DNN context (with $n$ the intermediate layer dimension) (Titensky et al., 2018). Compared to Monte Carlo or unscented transforms (which require $O(M)$ forward passes per input, $M\gtrsim \operatorname{dim}(x_0)$ ), EKF is substantially more efficient. When process noise $Q_\ell$ is set to $0$, EKF’s standard deviations match those from Monte Carlo almost exactly. Including nonzero $Q_\ell$ leads to larger, more realistic uncertainty intervals, as the filter now accounts for model error. For high-dimensional layers, $P_\ell$ can become dense and expensive, limiting scalability unless covariance is simplified (e.g., diagonal truncation) (Titensky et al., 2018).

In learned calibration scenarios, memoryless neural networks mapping current $\hat P_k$ recover almost all observed covariance miscalibration; incorporating $\hat x_k$ yields only marginal improvement (Tsuei et al., 2021). The computational cost of training such correctors is amortized over their use in online or streaming applications.

6. Assumptions, Limitations, and Theoretical Guarantees

Typical assumptions imposed for EKF-based uncertainty quantification include:

Gaussian input and process noise; output distribution is approximate Gaussian.
Activation functions must be piecewise linear (e.g., ReLU) or differentiable for efficient Jacobian computation.
No intermediate observations within non-output layers, so only early initialization conveys external information in DNNs (Titensky et al., 2018).
Covariance matrices $P_\ell$ may become impractically large in very high-dimensional state spaces.

Limitations include inability to track multi-modal distributions (sampling-based methods retain this capability at higher cost), dependence on accurate estimation of $Q_\ell$ (model noise) or calibrated mappings, and absence of closed-form guarantees that the learned $\phi$ maps do not introduce filter instability if used recursively (Titensky et al., 2018, Tsuei et al., 2021). Theoretical analyses in stochastic optimization demonstrate entry into a local region near the optimum in finite time, followed by logarithmic regret scaling in the local phase under standard regularity conditions (Vilmarest et al., 2020).

7. Directions for Extension and Open Challenges

Empirical evidence suggests systematic miscalibration of covariance estimates is a universal phenomenon in nonlinear EKF-based filters with fixed $R$ , $Q$ and first-order approximations (Tsuei et al., 2021). Learned or data-driven calibration functions show high effectiveness for restoring statistical coverage, particularly those that operate on covariance alone. The feasibility of a fully end-to-end “confidence-weighted EKF,” where the mapping is integrated into the recursion or predicted by a recurrent neural network, raises open questions on stability and closed-loop consistency. Practical covariance truncation and feature engineering for calibration mappings remain active areas. Generalization of these strategies to other fusion architectures (radar-inertial, GNSS-inertial) is plausible wherever systematic error in posterior uncertainty estimation is observed (Tsuei et al., 2021).