Papers
Topics
Authors
Recent
2000 character limit reached

Hybrid Residual Filter Framework in VINS

Updated 26 November 2025
  • The hybrid residual filter framework is a unified measurement update method integrating landmark reprojection and ray constraint residuals to enhance VINS accuracy.
  • It employs a Lie-group EKF structure with implicit mapping and online extrinsic calibration to adapt dynamically to challenging visual conditions.
  • Benchmark evaluations demonstrate that the framework reduces computational load while achieving state-of-the-art trajectory accuracy compared to optimization-based methods.

A hybrid residual filter framework is a measurement update methodology in visual-inertial navigation systems (VINS) that combines multiple forms of measurement residuals into a unified update for the Kalman filter, specifically targeting improved accuracy and robustness in pose estimation under challenging operational environments. In filter-based VINS, the hybrid residual filter subsumes both landmark reprojection errors and ray (inverse depth) constraints to construct a single Jacobian for measurement updates, enabling efficient fusion of stereo and multi-view geometric information while supporting implicit mapping and online extrinsic calibration (Du et al., 24 Nov 2025).

1. Foundation and State Representation

The hybrid residual filter framework is implemented in the context of a Lie-group EKF architecture, utilizing a state vector xkx_k comprising active navigation quantities, historical pose clones, stereo camera extrinsics, and map keyframe poses. Explicitly, the EKF state at time kk is organized as

xk=[xA xK],x_k = \begin{bmatrix} x_A \ x_K \end{bmatrix},

where xAx_A is the active state partitioned into IMU navigation state (xbx_b), pose clones (xcx_c), and extrinsics (xex_e):

  • xb=[qGB,vB,pB,bg,ba]x_b = [q_{G\leftarrow B}^\top, v_B^\top, p_B^\top, b_g^\top, b_a^\top]^\top
  • xc=[,(q,p)B,]x_c = [\dots, (q,p)_B, \dots]^\top
  • xe=[qCLB,pCL,qCRB,pCR]x_e = [q_{C_L \leftarrow B}^\top, p_{C_L}^\top, q_{C_R \leftarrow B}^\top, p_{C_R}^\top]^\top xKx_K is the pose set of map keyframes.

Continuous-time propagation follows

δxb˙=Fbδxb+Gbwb\dot{\delta x_b} = F_b \delta x_b + G_b w_b

where FbF_b encodes linearized dynamics, and GbG_b couples noise sources. Discrete-time updates utilize transition matrix Φ=exp(FbΔt)\Phi = \exp(F_b \Delta t) and process noise QQ for covariance propagation.

2. Measurement Models and Hybrid Residual Construction

The framework merges two primary forms of geometric residuals:

2.1 Landmark Reprojection Residual

For a keypoint observed in image ii with normalized image coordinates p~i=(ui,vi,1)\tilde p_i = (u_i, v_i, 1)^\top, the predicted position

pi=1e3XiXi,Xi=RCiBTBG(pfpCi)p_i = \frac{1}{e_3^\top X_i} X_i, \quad X_i = R_{C_i\leftarrow B} T_{B\leftarrow G}(p_f - p_{C_i})

is computed without explicit 3D landmark parameterization, using only pose relationships. The reprojection residual is

ri(uv)=p~ipiHxi(uv)δxA+ni(uv)r_i^{(uv)} = \tilde p_i - p_i \approx H_{x_i}^{(uv)} \delta x_A + n_i^{(uv)}

with a Jacobian given in block form, including derivatives with respect to pose and extrinsic parameters.

2.2 Ray Constraint Residual (Stereo and Multi-view)

Ray-based constraints enable depth estimation using stereo observations or multi-view geometry:

Zs=[tCRCL×]pCR[pCR×]RCRCLpCLZ_s = \frac{\|[t_{C_R\leftarrow C_L} \times] p_{C_R}\|}{\|[p_{C_R} \times] R_{C_R\leftarrow C_L} p_{C_L}\|}

with residual

ri(ray)=Z^iZsHxi(ray)δxA+ni(ray)r_i^{(\text{ray})} = \widehat Z_i - Z_s \approx H_{x_i}^{(\text{ray})} \delta x_A + n_i^{(\text{ray})}

where Hxi(ray)H_{x_i}^{(\text{ray})} includes blocks for all involved body poses and the extrinsics.

2.3 Unified EKF Measurement Update

All active-state residuals are stacked,

r=[r(uv) r(ray)],H=[H(uv) H(ray)]r = \begin{bmatrix} r^{(uv)} \ r^{(\text{ray})} \end{bmatrix}, \quad H = \begin{bmatrix} H^{(uv)} \ H^{(\text{ray})} \end{bmatrix}

enabling a single EKF update:

K=PAH(HPAH+R)1,δxAδxA+K(rHδxA),PA(IKH)PAK = P_A H^\top (H P_A H^\top + R)^{-1},\quad \delta x_A \leftarrow \delta x_A + K (r - H \delta x_A), \quad P_A \leftarrow (I - K H) P_A

This formulation allows the filter to exploit all available geometric constraints simultaneously, improving both immediate pose accuracy and depth consistency across views.

3. Map Management and Loop Closure

The hybrid residual filter operates with an implicit environmental map; SP-VINS maintains only keyframes (pose only) and associated 2D keypoints/descriptors. Loop closure and relocalization leverage 2D–2D matching via a bag-of-words system (DBoW2), with geometric verification using RANSAC. Upon successful loop detection, correspondence-generating reprojection residuals between current and historical keyframes are stacked into the filter update. No global 3D map triangulation or pose graph optimization is required; all constraints flow into the EKF in the same hybrid fashion:

ri(map)=p~ipi(j)Hxi(map)δx+nir_i^{(\text{map})} = \tilde p_i - p_i(j) \approx H_{x_i}^{(\text{map})} \delta x + n_i

with δx\delta x now extended to include both active and keyframe states.

4. Online Calibration of Camera–IMU Extrinsic Parameters

The filter state xex_e includes all stereo camera-IMU extrinsic parameters, which are dynamically estimated. Jacobians of all measurement residuals carry terms /δxe\partial \cdot/\partial \delta x_e, and the Kalman gain KK directly updates these extrinsics during measurement fusion. The on-manifold update is managed with quaternion and translation composition:

qδqq,pp+δpq \leftarrow \delta q \otimes q, \quad p \leftarrow p + \delta p

This mechanism ensures that the stereo rig or camera-IMU calibration adapts online to systematic bias, leading to improved accuracy in degraded environments (low texture, strong reflections, large depth).

5. Computational Efficiency and Performance Characteristics

Benchmark evaluations on EuRoC MAV, TUM VI, and KAIST-Urban datasets demonstrate that the hybrid residual filter framework achieves state-of-the-art trajectory accuracy at substantially reduced CPU usage compared to optimization-based VINS. Typical per-frame costs:

  • Visual front-end: 5\sim5 ms
  • Back-end propagation + hybrid update: $4$–$8$ ms
  • Loop closure: $2$–$4$ ms only when invoked

Total CPU load remains well below filter-based SLAM baselines; odometry-only mode uses $77$–$94$% of one core, loop-enabled uses $114$–$135$%, compared to VINS-Fusion’s >>200% (Du et al., 24 Nov 2025). This efficiency, combined with robust performance in degraded scenarios, suggests strong suitability for real-time mobile robotics.

6. Contextual Significance and Implications

The introduction and validation of the hybrid residual filter framework in SP-VINS marks a departure from traditional SLAM architectures that require explicit 3D map maintenance and complex bundle adjustment. Instead, the implicit map and unified residual fusion streamline the operational pipeline, reducing computational overhead while maintaining high accuracy. This orientation enables long-term, high-precision state estimation under challenging visual conditions and supports online self-calibration of extrinsics, directly responding to issues such as feature dropout and stereo bias.

A plausible implication is that similar hybrid residual fusion strategies may be adapted for other multi-sensor robotic navigation frameworks, especially in resource-constrained or highly dynamic environments where the computational burden and robustness are critical.

7. Comparative Perspective

Relative to conventional SLAM and VINS frameworks, filter-based systems employing hybrid residual filter updates—specifically the DST-EKF with stacked geometric constraints—demonstrate competitive to superior absolute trajectory error (ATE) and relative pose error (RPE) on standard benchmarks. For instance, SP-VINS achieves ATE of $0.11$ m on EuRoC MH05 vs. $0.13$–$0.14$ m for other filters, and a 30%\sim30\% ATE reduction on TUM VI compared to OpenVINS/SchurVINS. Loop closure modules efficiently prune and integrate keyframe constraints without requiring global optimization (Du et al., 24 Nov 2025).

In summary, the hybrid residual filter framework constitutes an efficient and theoretically grounded approach that leverages a unified measurement model for VINS, enabling robust and accurate localization and mapping in a filter-based paradigm.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Hybrid Residual Filter Framework.