Hybrid Residual Filter Framework in VINS

Updated 26 November 2025

The hybrid residual filter framework is a unified measurement update method integrating landmark reprojection and ray constraint residuals to enhance VINS accuracy.
It employs a Lie-group EKF structure with implicit mapping and online extrinsic calibration to adapt dynamically to challenging visual conditions.
Benchmark evaluations demonstrate that the framework reduces computational load while achieving state-of-the-art trajectory accuracy compared to optimization-based methods.

A hybrid residual filter framework is a measurement update methodology in visual-inertial navigation systems (VINS) that combines multiple forms of measurement residuals into a unified update for the Kalman filter, specifically targeting improved accuracy and robustness in pose estimation under challenging operational environments. In filter-based VINS, the hybrid residual filter subsumes both landmark reprojection errors and ray (inverse depth) constraints to construct a single Jacobian for measurement updates, enabling efficient fusion of stereo and multi-view geometric information while supporting implicit mapping and online extrinsic calibration (Du et al., 24 Nov 2025).

1. Foundation and State Representation

The hybrid residual filter framework is implemented in the context of a Lie-group EKF architecture, utilizing a state vector $x_k$ comprising active navigation quantities, historical pose clones, stereo camera extrinsics, and map keyframe poses. Explicitly, the EKF state at time $k$ is organized as

$x_k = \begin{bmatrix} x_A \ x_K \end{bmatrix},$

where $x_A$ is the active state partitioned into IMU navigation state ( $x_b$ ), pose clones ( $x_c$ ), and extrinsics ( $x_e$ ):

$x_b = [q_{G\leftarrow B}^\top, v_B^\top, p_B^\top, b_g^\top, b_a^\top]^\top$
$x_c = [\dots, (q,p)_B, \dots]^\top$
$x_e = [q_{C_L \leftarrow B}^\top, p_{C_L}^\top, q_{C_R \leftarrow B}^\top, p_{C_R}^\top]^\top$ $x_K$ is the pose set of map keyframes.

Continuous-time propagation follows

$\dot{\delta x_b} = F_b \delta x_b + G_b w_b$

where $F_b$ encodes linearized dynamics, and $G_b$ couples noise sources. Discrete-time updates utilize transition matrix $\Phi = \exp(F_b \Delta t)$ and process noise $Q$ for covariance propagation.

2. Measurement Models and Hybrid Residual Construction

The framework merges two primary forms of geometric residuals:

2.1 Landmark Reprojection Residual

For a keypoint observed in image $i$ with normalized image coordinates $\tilde p_i = (u_i, v_i, 1)^\top$ , the predicted position

$p_i = \frac{1}{e_3^\top X_i} X_i, \quad X_i = R_{C_i\leftarrow B} T_{B\leftarrow G}(p_f - p_{C_i})$

is computed without explicit 3D landmark parameterization, using only pose relationships. The reprojection residual is

$r_i^{(uv)} = \tilde p_i - p_i \approx H_{x_i}^{(uv)} \delta x_A + n_i^{(uv)}$

with a Jacobian given in block form, including derivatives with respect to pose and extrinsic parameters.

2.2 Ray Constraint Residual (Stereo and Multi-view)

Ray-based constraints enable depth estimation using stereo observations or multi-view geometry:

$Z_s = \frac{\|[t_{C_R\leftarrow C_L} \times] p_{C_R}\|}{\|[p_{C_R} \times] R_{C_R\leftarrow C_L} p_{C_L}\|}$

with residual

$r_i^{(\text{ray})} = \widehat Z_i - Z_s \approx H_{x_i}^{(\text{ray})} \delta x_A + n_i^{(\text{ray})}$

where $H_{x_i}^{(\text{ray})}$ includes blocks for all involved body poses and the extrinsics.

2.3 Unified EKF Measurement Update

All active-state residuals are stacked,

$r = \begin{bmatrix} r^{(uv)} \ r^{(\text{ray})} \end{bmatrix}, \quad H = \begin{bmatrix} H^{(uv)} \ H^{(\text{ray})} \end{bmatrix}$

enabling a single EKF update:

$K = P_A H^\top (H P_A H^\top + R)^{-1},\quad \delta x_A \leftarrow \delta x_A + K (r - H \delta x_A), \quad P_A \leftarrow (I - K H) P_A$

This formulation allows the filter to exploit all available geometric constraints simultaneously, improving both immediate pose accuracy and depth consistency across views.

3. Map Management and Loop Closure

The hybrid residual filter operates with an implicit environmental map; SP-VINS maintains only keyframes (pose only) and associated 2D keypoints/descriptors. Loop closure and relocalization leverage 2D–2D matching via a bag-of-words system (DBoW2), with geometric verification using RANSAC. Upon successful loop detection, correspondence-generating reprojection residuals between current and historical keyframes are stacked into the filter update. No global 3D map triangulation or pose graph optimization is required; all constraints flow into the EKF in the same hybrid fashion:

$r_i^{(\text{map})} = \tilde p_i - p_i(j) \approx H_{x_i}^{(\text{map})} \delta x + n_i$

with $\delta x$ now extended to include both active and keyframe states.

4. Online Calibration of Camera–IMU Extrinsic Parameters

The filter state $x_e$ includes all stereo camera-IMU extrinsic parameters, which are dynamically estimated. Jacobians of all measurement residuals carry terms $\partial \cdot/\partial \delta x_e$ , and the Kalman gain $K$ directly updates these extrinsics during measurement fusion. The on-manifold update is managed with quaternion and translation composition:

$q \leftarrow \delta q \otimes q, \quad p \leftarrow p + \delta p$

This mechanism ensures that the stereo rig or camera-IMU calibration adapts online to systematic bias, leading to improved accuracy in degraded environments (low texture, strong reflections, large depth).

5. Computational Efficiency and Performance Characteristics

Benchmark evaluations on EuRoC MAV, TUM VI, and KAIST-Urban datasets demonstrate that the hybrid residual filter framework achieves state-of-the-art trajectory accuracy at substantially reduced CPU usage compared to optimization-based VINS. Typical per-frame costs:

Visual front-end: $\sim5$ ms
Back-end propagation + hybrid update: $4$–$8$ ms
Loop closure: $2$–$4$ ms only when invoked

Total CPU load remains well below filter-based SLAM baselines; odometry-only mode uses $77$–$94$% of one core, loop-enabled uses $114$–$135$%, compared to VINS-Fusion’s $>$ 200% (Du et al., 24 Nov 2025). This efficiency, combined with robust performance in degraded scenarios, suggests strong suitability for real-time mobile robotics.

6. Contextual Significance and Implications

The introduction and validation of the hybrid residual filter framework in SP-VINS marks a departure from traditional SLAM architectures that require explicit 3D map maintenance and complex bundle adjustment. Instead, the implicit map and unified residual fusion streamline the operational pipeline, reducing computational overhead while maintaining high accuracy. This orientation enables long-term, high-precision state estimation under challenging visual conditions and supports online self-calibration of extrinsics, directly responding to issues such as feature dropout and stereo bias.

A plausible implication is that similar hybrid residual fusion strategies may be adapted for other multi-sensor robotic navigation frameworks, especially in resource-constrained or highly dynamic environments where the computational burden and robustness are critical.

7. Comparative Perspective

Relative to conventional SLAM and VINS frameworks, filter-based systems employing hybrid residual filter updates—specifically the DST-EKF with stacked geometric constraints—demonstrate competitive to superior absolute trajectory error (ATE) and relative pose error (RPE) on standard benchmarks. For instance, SP-VINS achieves ATE of $0.11$ m on EuRoC MH05 vs. $0.13$–$0.14$ m for other filters, and a $\sim30\%$ ATE reduction on TUM VI compared to OpenVINS/SchurVINS. Loop closure modules efficiently prune and integrate keyframe constraints without requiring global optimization (Du et al., 24 Nov 2025).

In summary, the hybrid residual filter framework constitutes an efficient and theoretically grounded approach that leverages a unified measurement model for VINS, enabling robust and accurate localization and mapping in a filter-based paradigm.

PDF Markdown Chat (Pro)

References (1)

SP-VINS: A Hybrid Stereo Visual Inertial Navigation System based on Implicit Environmental Map (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Hybrid Residual Filter Framework.