MambaIO: Neural Inertial Odometry Framework
- MambaIO is a neural inertial odometry framework that decouples IMU signals into low- and high-frequency bands via a Laplacian pyramid to enhance trajectory estimation.
- The architecture employs a dual-branch design: a Mamba state space model for long-range motion and a multi-path convolutional network for local motion details.
- Evaluation on six public datasets shows MambaIO reduces trajectory errors by 8–15% compared to prior methods, demonstrating significant performance improvements.
MambaIO is a neural inertial odometry (IO) framework designed to recover pedestrian trajectories using only raw 3-axis accelerometer and gyroscope measurements from commodity inertial measurement units (IMUs), processed in the global (gravity-aligned) coordinate frame. It introduces a frequency-decoupled modeling strategy, where inertial signals are split into low- and high-frequency bands via a Laplacian pyramid. The low-frequency component is processed using a Mamba architecture (a linear-time state space model, SSM), which excels at extracting long-range contextual motion cues, while local motion details in the high-frequency band are modeled via multi-path convolutional (MPC) structures. MambaIO demonstrates state-of-the-art trajectory accuracy on six public pedestrian IO datasets, yielding substantial reductions in global and local trajectory errors relative to previous methods (Zhang, 19 Nov 2025).
1. Coordinate Frame Analysis and Motivation
Pedestrian IO seeks to estimate incremental pose from windowed sequences of IMU measurements . Traditional strapdown integration accumulates drift due to bias and noise via double integration. Learning-based approaches train neural mappings end-to-end over short windows to regress pose increments, enhancing both robustness and accuracy.
A fundamental design choice is the representation of IMU data in either the body frame (IMU-attached axes) or the global frame (gravity-aligned axes). For arbitrarily held phones during pedestrian movement, the global frame leads to temporally smoother, semantically coherent signals. Kinematic analysis—using PCA and t-SNE visualizations—shows that global-frame representations yield more compact, discriminative latent features for human IO tasks, supporting the adoption of global coordinates for MambaIO (Zhang, 19 Nov 2025).
2. Frequency-Decoupled Signal Decomposition via Laplacian Pyramid
To isolate global motion trends from rapid, local fluctuations, MambaIO introduces a differentiable Laplacian pyramid decomposition:
- For input (windowed IMU signals), apply depthwise average convolution (, ) to downsample and extract the low-frequency component:
- Upsample to length using nearest-neighbor interpolation:
- Compute the high-frequency residual:
This decomposition yields (slow, global motion trends) and (rapid, localized dynamics), which are modeled in specialized branches.
3. Dual-Branch MambaIO Network Architecture
The MambaIO architecture processes decomposed IMU signals through separate branches before fusion and prediction:
A. Multi-Path Convolution (MPC) High-Frequency Branch
- Three parallel depthwise convolutions with kernel sizes (stride 1) extract multi-scale local features from .
- Outputs are concatenated (), passed through an SE block (channel reweighting), then compressed via a convolution:
B. Mamba State Space Model (SSM) Low-Frequency Branch
- The Mamba block models via learned input gating and convolution, producing two parallel streams:
- Streams are concatenated and linearly fused:
- A self-attention layer is appended post-Mamba to emphasize critical time frames.
C. Branch Fusion and Pose Prediction
The MPC and Mamba streams are concatenated () and passed through a convolution to produce , which is regressed to pose increments .
Data Flow Schematic
4. Underlying State Estimation Equations and Loss Functions
MambaIO’s regression task implicitly approximates the discrete IO equations under the global frame. The system state at time is , with
- : position,
- : velocity,
- : orientation quaternion.
Given (global frame), idealized strapdown integration follows:
\begin{align*} \mathbf{p}t &= \mathbf{p}{t-1} + \mathbf{v}{t-1}\,\Delta t + \tfrac{1}{2}(\mathbf{a}_t - \mathbf{g})\,\Delta t2, \ \mathbf{v}_t &= \mathbf{v}{t-1} + (\mathbf{a}t - \mathbf{g})\,\Delta t, \ \mathbf{q}_t &= \mathbf{q}{t-1}\,\otimes\,\exp(\tfrac{1}{2}\boldsymbol{\omega}_t\Delta t), \end{align*}
where  m/s; denotes quaternion product.
The training loss is a weighted sum of position and quaternion errors:
where is an axis-angle quaternion error measure; are hyperparameters.
5. Training Procedure and Empirical Validation
MambaIO is evaluated on six public pedestrian IO benchmarks: RIDI, RoNIN, RNIN, OxIOD, TLIO, IMUNet, encompassing diverse use cases and conditions. The training protocol is as follows:
- Network: Four hierarchical stages with channel widths [64, 128, 256, 512], followed by the dual MambaIO branches.
- Optimizer: Adam (initial , cos/step decay to ).
- Windows: ; batch size ≈ 64 per GPU (5× NVIDIA RTX 3090, PyTorch 2.5.1).
- Early stopping at epoch 40.
Evaluation metrics:
- Absolute Trajectory Error (ATE): Global RMS position error.
- Relative Trajectory Error (RTE): Sliding-window RMS error (1 s), quantifying local drift.
MambaIO achieves consistent improvements over prior methods (RoNIN-ResNet, TLIO) in both ATE and RTE, as summarized:
| Dataset | Baseline (ATE/RTE) | MambaIO (ATE/RTE) | % Improvement |
|---|---|---|---|
| RoNIN | 0.58 m / 0.32 m/s | 0.51 m / 0.28 m/s | 12% |
| TLIO | 0.48 m / 0.26 m/s | 0.42 m / 0.23 m/s | 13% |
Across all datasets, MambaIO yields 8–15% reduction in ATE and 9–14% in RTE (Zhang, 19 Nov 2025). Qualitative trajectory plots show close adherence to ground truth, particularly in complex navigational scenarios.
6. Ablation Study and Component Contributions
Two ablated variants were analyzed:
- Conv-only (MPC branch, no Mamba): average ATE ≈ 0.55 m
- Mamba-only (no MPC branch): average ATE ≈ 0.57 m
Both outperform RoNIN-ResNet but underperform full MambaIO (ATE ≈ 0.51 m). This confirms that dual-frequency decomposition and branch-specific modeling yield complementary accuracy gains, with MPC capturing local detail and Mamba SSM capturing global context.
7. Conclusions and Prospective Extensions
MambaIO systematically revisits the global-coordinate paradigm for pedestrian inertial odometry, providing theoretical and empirical evidence of its superiority for human motion tracking, due to temporally coherent, gravity-aligned IMU streams. By leveraging Laplacian pyramid-based frequency decoupling and dedicated SSM/convolutional branches, MambaIO jointly models both coarse movement trajectories and fine-grained dynamics. Results on six benchmarks set new SOTA for pedestrian IO (8–15% ATE, 9–14% RTE improvement). Prospective directions include optimizing the SSM branch’s runtime and extending the Laplacian pyramid decomposition to multiple levels to enhance multi-scale motion modeling (Zhang, 19 Nov 2025).