Multi-State Constraint Kalman Filter
- Multi-State Constraint Kalman Filter (MSCKF) is an EKF variant that efficiently fuses inertial and exteroceptive measurements using sliding window state augmentation and null-space projection.
- It leverages state augmentation and selective marginalization to balance computational cost with robust drift correction, making it ideal for real-time odometry and SLAM.
- Variants such as Fast-MSCKF and PO-MSCKF further enhance performance through adaptive feature management and pose-only updates to reduce linearization errors.
The Multi-State Constraint Kalman Filter (MSCKF) is an efficient filtering framework designed for high-rate fusion of inertial and exteroceptive (typically visual, sometimes LiDAR or magnetic) measurements. Its principal innovation lies in managing a sliding window of multiple state instances to tightly couple successive poses via cross-temporal constraints, but without the heavy computational burden of maintaining or optimizing explicit feature (landmark) positions. The MSCKF and its variants underpin numerous state-of-the-art odometry and SLAM systems, particularly when real-time operation and bounded resource usage are paramount.
1. Fundamental Principles and Conceptual Foundations
MSCKF is an Extended Kalman Filter (EKF) variant that augments the state vector with a dynamically managed set of historical sensor poses (typically camera or LiDAR) alongside the current inertial state. The core innovation is to update the filter using multi-state constraints arising from features observed across multiple states, but to project out (nullify) the sensitivity to each feature’s 3D position. This is achieved mathematically by projecting the measurement Jacobian’s dependency on the feature position into its left-null space prior to the update step—removing the need to ever instantiate or update explicit feature positions in the state or computation of the Kalman innovation (Sun et al., 2017).
This formulation achieves optimal computational complexity, sidestepping the cubic scaling of bundle adjustment operations common in optimization-based approaches, while still leveraging the spatial correlation across multiple frames for robust data association and drift correction.
State vector augmentation and selective marginalization are central. As new observations arrive, the state vector grows to include the newly acquired poses; older ones are periodically marginalized out to cap computation. Marginalization employs a keyframe selection policy and is executed in a manner that preserves statistical dependencies, either in fixed intervals (Sun et al., 2017) or in response to feature tracking heuristics (Abdollahi et al., 2022).
2. Mathematical Formulation
The augmented state vector at time-step in the classical S-MSCKF can be written as:
where denotes the current IMU state (attitude, biases, velocity, position, camera-IMU extrinsics), and are the camera pose states for a sliding window of size .
The process model propagates the IMU state forward using the error-state formulation:
with being the state transition and noise input matrices, and the inertial noise.
The measurement update is formulated using stacked observations for each feature across its tracked frames:
Here, are errors in all augmented states, is the error in the (not explicitly instantiated) feature’s global position. The critical step is projecting both residual and Jacobians onto the null-space of :
Now the update involves only the pose sub-states and not the feature position, enabling efficient pruning of features and states. Marginalization and covariance update follow standard Kalman update equations with error-state reparametrization to accommodate the manifold structure of rotations (Sun et al., 2017).
In stereo VIO, measurements from both cameras are stacked, with known extrinsics used to transform right-camera observations, and the above process is applied identically (Sun et al., 2017).
3. Implementation and Algorithmic Variants
Algorithmic structure is determined by the following key design choices:
- State Augmentation and Marginalization: On each incoming image (camera or LiDAR scan), augment the state vector. Once the window size is exceeded, marginalize out old states. Strategies vary: evenly spaced removals for smooth computation (Sun et al., 2017), marginalization conditioned on minimum feature track counts (Abdollahi et al., 2022), or sharper heuristics for resource-constrained scenarios.
- Feature and Update Management: Classic MSCKF operates on features observed across consecutive states, discarding tracks when occluded or out-of-FOV. Fast-MSCKF (Abdollahi et al., 2022) limits new feature extraction to keyframes, triggers update and aggressive state pruning when tracked features drop below a threshold, and employs a reduced QR decomposition for efficient update computation.
- Immediate vs. Delayed Update: Recent work distinguishes between “delayed” updates—where measurement updates are made only when a feature track is terminated—and “immediate” updates—where the filter is updated at every new pose, increasing the number of constraints and improving linearization accuracy and consistency (see theorems and experiments in (Zhang et al., 4 Nov 2024)).
- Alternative Measurement Models: PO-MSCKF (Du et al., 2 Jul 2024) refactors the observation model using pose-only multi-view geometric constraints, removing the need for explicit 3D structure and null-space projection, which eliminates associated linearization errors and reduces computational load, particularly beneficial in large-depth scenarios.
- Sensors Beyond Cameras and IMUs: For LiDAR-inertial odometry, MSC-LIO (Zhang et al., 10 Jul 2024) introduces a same-plane cluster tracking strategy for data association, integrates time-delay states, and uses a custom measurement model (point-to-plane residuals) to provide multi-state constraints.
4. Practical Performance and Robustness
Experimental results indicate that stereo MSCKF delivers real-time, robust odometry with accuracy and drift properties comparable to state-of-the-art optimization and smoothing methods, but with significantly lower and more predictable CPU and memory consumption. On the EuRoC dataset and in fast flight scenarios with speeds up to 17.5 m/s, the S-MSCKF achieved low-drift motion estimation under diverse conditions including rapid maneuvers, lighting changes, and highly dynamic environments (Sun et al., 2017).
Image processing (feature tracking, matching) constitutes the dominant computational cost (≈80%); the filter update is lightweight (≈10% CPU at 20 Hz) (Sun et al., 2017, Abdollahi et al., 2022). Fast-MSCKF further reduces processing cost (by up to 6×) and improves positional RMSE by over 20%, mainly through leaner feature extraction and aggressive state management (Abdollahi et al., 2022).
In ablation and real-world tests, consistent handling of observability (e.g. via OC-EKF) is critical to avoid spurious overconfidence in unobservable directions (e.g. global yaw, position), which MSCKF specifically accounts for (Sun et al., 2017).
5. Generalization: Constrained Estimation, Nonlinear Manifold Formulations, and Extensions
MSCKF can be interpreted as a specialized instance of constrained Kalman filtering: state constraints in MSCKF arise from multi-state geometric relations imposed by cross-view feature observations (Amor et al., 2018, Herty et al., 2019). Instead of single-state (hard or soft) constraints, the MSCKF builds equality constraints that couple several sequential states.
Filtering frameworks for constrained estimation rely on projection (minimization), measurement augmentation (pseudo-measurements), or gain projection to enforce constraints. MSCKF’s null-space projection is mathematically analogous, but is optimized for efficiency and consistency (especially with linear constraints; see optimality analysis in (Herty et al., 2019)).
Recent developments recast MSCKF in a unified (manifold-based) nonlinear optimization framework, showing that MSCKF’s filtering steps correspond to a Gauss–Newton descent with marginalization in the cost function over both Euclidean and manifold-valued subspaces (see the generalized cost and boxplus/boxminus notation in (Saxena et al., 2021)). This illuminates the trade-off between accuracy (sliding window size, number of GN steps) and speed, and provides a seamless interpolation between pure filtering and batch optimization (Saxena et al., 2021).
Sensor modality extensions include:
- Partial-invariant EKF MSCKF (PIEKF-VIWO), which increases consistency by only embedding rotation and velocity in Lie group structure, circumventing errors from position linearization in classical formulation (Hua et al., 2023).
- LiDAR-inertial odometry (MSC-LIO), utilizing plane-feature tracking and temporal delay compensation in a frame-to-frame multi-state constraint paradigm (Zhang et al., 10 Jul 2024).
- Magnetic-inertial odometry, where temporal constraints from local magnetic field models augment inertial navigation to improve velocity and position estimation in GPS-denied indoor environments (Li et al., 19 May 2025).
6. Applications and Implications
MSCKF and its variants are widely adopted in autonomous aerial vehicles, fast ground robots, and real-time mobile SLAM systems. Core application domains include:
- Resource-constrained robotic platforms, such as micro aerial vehicles, where minimal CPU and memory footprint is essential (Sun et al., 2017, Abdollahi et al., 2022).
- High-dynamics navigation, requiring robust performance under aggressive maneuvers, variable scene geometry, and modest sensor quality, exemplified by fast indoor/offboard flights at velocities up to 17.5 m/s (Sun et al., 2017).
- GPS-denied and unstructured environments, including urban, indoor, forested, and underground settings, where vision or LiDAR must replace global position fixes (Sun et al., 2017, Abdollahi et al., 2022, Zhang et al., 10 Jul 2024, Li et al., 19 May 2025).
- Real-time cross-sensor odometry, extending to wheel, LiDAR, and magnetic sensor fusion, for indoor infrastructure, warehouse robots, and low-cost IoT devices (Hua et al., 2023, Zhang et al., 10 Jul 2024, Li et al., 19 May 2025).
- Robotics research, where rigorous analysis of filter consistency, observability, and trade-offs is required for algorithm benchmarking and system design.
7. Limitations and Future Directions
While MSCKF provides computational efficiency, limitations include potential accuracy degradation due to linearization errors—especially when update strategies (delayed vs. immediate) are not matched to problem characteristics (Zhang et al., 4 Nov 2024). The immediate update strategy improves consistency and tightens covariance at additional computational cost, motivating adaptive strategies for window and feature management (Zhang et al., 4 Nov 2024).
PO-MSCKF’s (Du et al., 2 Jul 2024) pose-only formulation mitigates linearization errors introduced by 3D reconstruction, and achieves superior performance in challenging, large-depth environments.
MSCKF’s filtering approach, though efficient, is in general slightly less accurate than full-optimization (bundle adjustment) methods, especially in environments with abundant visual structure. However, hybrid and modular frameworks allow tuning along the filtering–optimization spectrum (Saxena et al., 2021).
Continued development includes:
- Fusion with diverse sensors for enhanced observability.
- Theoretical analysis of robustness and convergence in highly dynamic or degenerate scenes.
- More sophisticated data association and outlier handling for real-world applicability (Hua et al., 2023, Zhang et al., 10 Jul 2024).
- Integration with equivariant, invariant, or unscented filtering for improved uncertainty representation (Zhang et al., 4 Nov 2024).
MSCKF and its lineage remain critical to practical VIO and related estimation systems, establishing a foundation for robust and efficient multi-sensor fusion in resource-limited, high-dynamics, or perceptually challenging environments.