Hybrid Residual Filter Framework in VINS
- The hybrid residual filter framework is a unified measurement update method integrating landmark reprojection and ray constraint residuals to enhance VINS accuracy.
- It employs a Lie-group EKF structure with implicit mapping and online extrinsic calibration to adapt dynamically to challenging visual conditions.
- Benchmark evaluations demonstrate that the framework reduces computational load while achieving state-of-the-art trajectory accuracy compared to optimization-based methods.
A hybrid residual filter framework is a measurement update methodology in visual-inertial navigation systems (VINS) that combines multiple forms of measurement residuals into a unified update for the Kalman filter, specifically targeting improved accuracy and robustness in pose estimation under challenging operational environments. In filter-based VINS, the hybrid residual filter subsumes both landmark reprojection errors and ray (inverse depth) constraints to construct a single Jacobian for measurement updates, enabling efficient fusion of stereo and multi-view geometric information while supporting implicit mapping and online extrinsic calibration (Du et al., 24 Nov 2025).
1. Foundation and State Representation
The hybrid residual filter framework is implemented in the context of a Lie-group EKF architecture, utilizing a state vector comprising active navigation quantities, historical pose clones, stereo camera extrinsics, and map keyframe poses. Explicitly, the EKF state at time is organized as
where is the active state partitioned into IMU navigation state (), pose clones (), and extrinsics ():
- is the pose set of map keyframes.
Continuous-time propagation follows
where encodes linearized dynamics, and couples noise sources. Discrete-time updates utilize transition matrix and process noise for covariance propagation.
2. Measurement Models and Hybrid Residual Construction
The framework merges two primary forms of geometric residuals:
2.1 Landmark Reprojection Residual
For a keypoint observed in image with normalized image coordinates , the predicted position
is computed without explicit 3D landmark parameterization, using only pose relationships. The reprojection residual is
with a Jacobian given in block form, including derivatives with respect to pose and extrinsic parameters.
2.2 Ray Constraint Residual (Stereo and Multi-view)
Ray-based constraints enable depth estimation using stereo observations or multi-view geometry:
with residual
where includes blocks for all involved body poses and the extrinsics.
2.3 Unified EKF Measurement Update
All active-state residuals are stacked,
enabling a single EKF update:
This formulation allows the filter to exploit all available geometric constraints simultaneously, improving both immediate pose accuracy and depth consistency across views.
3. Map Management and Loop Closure
The hybrid residual filter operates with an implicit environmental map; SP-VINS maintains only keyframes (pose only) and associated 2D keypoints/descriptors. Loop closure and relocalization leverage 2D–2D matching via a bag-of-words system (DBoW2), with geometric verification using RANSAC. Upon successful loop detection, correspondence-generating reprojection residuals between current and historical keyframes are stacked into the filter update. No global 3D map triangulation or pose graph optimization is required; all constraints flow into the EKF in the same hybrid fashion:
with now extended to include both active and keyframe states.
4. Online Calibration of Camera–IMU Extrinsic Parameters
The filter state includes all stereo camera-IMU extrinsic parameters, which are dynamically estimated. Jacobians of all measurement residuals carry terms , and the Kalman gain directly updates these extrinsics during measurement fusion. The on-manifold update is managed with quaternion and translation composition:
This mechanism ensures that the stereo rig or camera-IMU calibration adapts online to systematic bias, leading to improved accuracy in degraded environments (low texture, strong reflections, large depth).
5. Computational Efficiency and Performance Characteristics
Benchmark evaluations on EuRoC MAV, TUM VI, and KAIST-Urban datasets demonstrate that the hybrid residual filter framework achieves state-of-the-art trajectory accuracy at substantially reduced CPU usage compared to optimization-based VINS. Typical per-frame costs:
- Visual front-end: ms
- Back-end propagation + hybrid update: $4$–$8$ ms
- Loop closure: $2$–$4$ ms only when invoked
Total CPU load remains well below filter-based SLAM baselines; odometry-only mode uses $77$–$94$% of one core, loop-enabled uses $114$–$135$%, compared to VINS-Fusion’s 200% (Du et al., 24 Nov 2025). This efficiency, combined with robust performance in degraded scenarios, suggests strong suitability for real-time mobile robotics.
6. Contextual Significance and Implications
The introduction and validation of the hybrid residual filter framework in SP-VINS marks a departure from traditional SLAM architectures that require explicit 3D map maintenance and complex bundle adjustment. Instead, the implicit map and unified residual fusion streamline the operational pipeline, reducing computational overhead while maintaining high accuracy. This orientation enables long-term, high-precision state estimation under challenging visual conditions and supports online self-calibration of extrinsics, directly responding to issues such as feature dropout and stereo bias.
A plausible implication is that similar hybrid residual fusion strategies may be adapted for other multi-sensor robotic navigation frameworks, especially in resource-constrained or highly dynamic environments where the computational burden and robustness are critical.
7. Comparative Perspective
Relative to conventional SLAM and VINS frameworks, filter-based systems employing hybrid residual filter updates—specifically the DST-EKF with stacked geometric constraints—demonstrate competitive to superior absolute trajectory error (ATE) and relative pose error (RPE) on standard benchmarks. For instance, SP-VINS achieves ATE of $0.11$ m on EuRoC MH05 vs. $0.13$–$0.14$ m for other filters, and a ATE reduction on TUM VI compared to OpenVINS/SchurVINS. Loop closure modules efficiently prune and integrate keyframe constraints without requiring global optimization (Du et al., 24 Nov 2025).
In summary, the hybrid residual filter framework constitutes an efficient and theoretically grounded approach that leverages a unified measurement model for VINS, enabling robust and accurate localization and mapping in a filter-based paradigm.