Inertial Odometry (IO): Methods & Advances

Updated 26 November 2025

Inertial Odometry (IO) is the process of estimating trajectory, pose, and velocity using only acceleration and angular velocity data from IMUs.
Modern IO methods leverage learning-based motion priors, state-space modeling, and frequency decomposition to mitigate drift and enhance localization accuracy.
Advanced approaches integrate uncertainty quantification and multi-sensor fusion with lightweight architectures to achieve real-time performance on diverse platforms.

Inertial Odometry (IO) is the discipline of recovering the trajectory, pose, and velocity of a moving platform using only acceleration and angular velocity measurements from Inertial Measurement Units (IMUs). IO underpins localization in robotics, autonomous vehicles, consumer devices, and wearable electronics, particularly where GNSS or external sensors are unavailable. This task, originally approached with strapdown mechanization and recursive filtering, faces principal challenges of bias, noise, and drift accumulation. Modern IO integrates learning-based motion priors and sophisticated state-space modeling, with architectural advances targeting drift mitigation, coordinate frame invariance, uncertainty quantification, robustness, and efficient multi-sensor fusion.

1. Principles and Coordinate Frame Preprocessing

In IMU-based IO, one distinguishes two main coordinate frame paradigms for preprocessing: the "body frame" (sensor-centric) and the "global frame" (gravity-aligned) (Zhang, 19 Nov 2025). The body frame uses raw IMU outputs directly—in drone or rigid-mount use-cases, this is mathematically equivalent to global-frame approaches and enables consistent modeling. For pedestrian IO, however, loosely mounted and nonrigid IMU-to-body mappings render body-frame use subject to acute representational discontinuities whenever the device changes orientation. The global frame paradigm rotates IMU measurements into a gravity-aligned reference, giving the Z-axis global significance and yielding velocity and displacement labels in this frame. This approach produces smoother latent representations, reduces heading-related discontinuities, and improves discriminative concentration (higher PCA energy, superior t-SNE clustering by ground-truth speed). MambaIO demonstrates that, for pedestrian tasks, the global frame strongly outperforms body-frame representations (Zhang, 19 Nov 2025). In contrast, cutting-edge UAV IO exploits body-frame representation to maximize kinematic observability under highly nonlinear dynamics (Qiu et al., 26 Jan 2025).

2. Motion Modeling: Classic, Learning-Based, and Frequency Decomposition

Classical IO integrates IMU streams using strapdown equations and recursive filtering (EKF, particle filters), accumulating drift super-linearly and typically requiring zero-velocity updates (ZUPT), explicit gait models, or multi-sensor fusion for practical operation (Liu et al., 2020). Learning-based IO methods, such as RoNIN, RIDI, TLIO, and MambaIO, employ recurrent, convolutional, or transformer-style architectures to learn nonlinear motion priors directly from IMU streams (Zhang, 19 Nov 2025). Critical drift sources like bias, noise, and high-frequency vibration are disentangled from underlying motion using techniques such as Laplacian pyramid decomposition: the IMU window $\mathbf{X}\in\mathbb{R}^{6\times L}$ is split into low-frequency global motion trend $\mathbf{X}_{low}$ and high-frequency residual $\mathbf{X}_{high}$ , each processed via domain-specific modules—Mamba (state-space model, global trend extraction) and multi-path convolution (local, fine-grained dynamics) (Zhang, 19 Nov 2025).

Recent lightweight architectures like DWSFormer further increase modeling capacity by projecting raw IMU features into high-dimensional implicit nonlinear spaces (Star Operation), bolstered by dual-wing channel and temporal attention mechanisms and multi-scale gated convolutions (Zhang et al., 22 Jul 2025). These advances enable superior representation of complex motion patterns (e.g., turns, abrupt changes) and integration of global and local motion cues, reducing localization error across pedestrian and mobile datasets.

3. Uncertainty Quantification and Hybrid Model-Based Fusion

Uncertainty estimation and propagation is essential for robust IO, particularly for multi-sensor fusion and graph-based optimization. AirIMU pioneered fully differentiable IMU pre-integration coupled with learned uncertainty modules, learning both deterministic corrections (frame-wise bias offsets) and stochastic terms (per-frame variances for $\{\text{acc}, \text{gyro}\}$ ) (Qiu et al., 2023). Covariance propagation through discrete-time kinematics enables IMU-GPS pose graph optimization, yielding up to 31.6% improvement in accuracy against conventional approaches.

Hybrid approaches such as TLIO tightly couple learned displacement or velocity estimates and their uncertainties to recursive filters (EKF/stochastic cloning), resulting in statistically consistent fusion of data-driven and mechanistic priors (Liu et al., 2020). EqNIO exploits symmetry-aware equivariant neural layers, mapping IMU data into canonical frames to enforce physical invariances in displacement and covariance estimation, markedly improving generalization and sharpness of uncertainty calibration (Jayanth et al., 12 Aug 2024).

4. Dataset Diversity, Evaluation Protocols, and Real-World Performance

IO methods are validated on several large-scale, information-rich datasets. For pedestrian IO, six primary benchmarks are employed: RIDI, RoNIN, RNIN, OxIOD, TLIO, and IMUNet (Zhang, 19 Nov 2025). These datasets cover device placements (chest, hand, pocket), motion types (walking, running, stairs), and trajectories from 50 m to 1 km. For vehicular and mobile platforms, IO-VNBD provides synchronized vehicle and smartphone IMU/GPS/wheel-speed datasets over 5,700 km (Onyekpe et al., 2020).

Standard metrics include Absolute Trajectory Error (ATE), Relative Trajectory Error (RTE), velocity RMSE, and segment-based drift analysis. MambaIO reduces ATE by 15–30% over strong baselines (RoNIN, TLIO) and maintains real-time throughput (~125 Hz on RTX 3090) (Zhang, 19 Nov 2025). DWSFormer achieves reductions in ATE of up to 65.78% on diverse mobile datasets (Zhang et al., 22 Jul 2025). TLIO, EqNIO, and AirIMU further demonstrate consistent gains in position and orientation drift, statistically calibrated covariance estimation, and improved robustness to platform/sensor domain shifts (Liu et al., 2020, Qiu et al., 2023, Jayanth et al., 12 Aug 2024).

5. Practical Integration, Complexity, and Deployment

Modern IO architectures emphasize real-time performance and deployment suitability for consumer devices and embedded platforms. Laplacian pyramid decomposition and linear-time sequence models (Mamba) yield $O(L)$ inference times, with full pedestrian IO trajectories updated in ~8 ms on GPU and optimized quantized variants exceeding 50 Hz on mobile processors (Zhang, 19 Nov 2025). DWSFormer demonstrates comparable expressiveness to prior deep convolutional or recurrent models while halving parameter count and FLOPs (Zhang et al., 22 Jul 2025). Lightweight causal convolutional models (L-IONet) enable edge inference on ARM CPUs and smartwatches at ≥10 Hz, matching or exceeding LSTM-based IO methods (Chen et al., 2020).

Ablation studies on dual-path architectures (convolution-only, Mamba-only, full) consistently show that frequency-decoupled modeling outperforms single-path designs, confirming the orthogonality of global and local motion cues to drift suppression (Zhang, 19 Nov 2025). Future directions include on-device attitude calibration, further model compression, sparse sequence models for ultra-resource-constrained applications, and active online adaptation to previously unseen behaviors (Zhang, 19 Nov 2025).

6. Extensions, Limitations, and Outlook

Current IO systems are robust across placement, carrier type, and environmental regime, especially in pedestrian and hand-held domains. Notable limitations persist on ultra-resource-constrained devices (where even linear-time sequence models incur non-negligible overhead), under unmodeled behaviors (cycling, skateboarding, crawling), or extreme environmental variation. Transfer to highly dynamic UAVs is best achieved using body-frame representation and explicit attitude encoding (Qiu et al., 26 Jan 2025), whereas pedestrian/consumer devices benefit from global-frame, gravity-aligned preprocessing. Sensor-agnostic uncertainty adapters, lightweight sequence models, and adaptive domain transfer are active research topics.

Inertial odometry continues to evolve, synthesizing rigorous state-space modeling, equivariant neural architectures, principled coordinate frame selection, and frequency-decoupled disentanglement. These advances set new benchmarks for drift, uncertainty calibration, and real-time localization in rich consumer-grade and robotic localization scenarios (Zhang, 19 Nov 2025, Zhang et al., 22 Jul 2025, Jayanth et al., 12 Aug 2024, Qiu et al., 2023, Liu et al., 2020).