IMU Motion Capture Dataset Overview

Updated 27 October 2025

IMU motion capture datasets are structured collections providing time-resolved, high-frequency sensor readings like acceleration, angular velocity, and orientation.
They employ robust synchronization, calibration, and sensor-fusion methods to align wearable sensor data with optical and video measurements.
These datasets are critical for benchmarking pose estimation, human biomechanics, and robotics applications, offering both raw and annotated data.

Inertial Measurement Unit (IMU) motion capture datasets are structured resources that combine time-resolved data streams from wearable IMU sensors with additional measurement modalities—such as optical motion capture, synchronized video, or ground truth pose data—for the analysis and modeling of human, object, or device motion. IMU datasets provide high-frequency, noise-prone representations of linear acceleration, angular velocity, and often orientation (quaternion or rotation matrix), and serve as a foundation for the development, calibration, and benchmarking of pose estimation, sensor fusion algorithms, and human or robotic movement characterization across scientific, industrial, and consumer applications.

1. IMU Hardware Specifications and Data Modalities

IMU motion capture datasets typically utilize body-worn or object-mounted sensor suits, with either tightly coupled or sparse sensor distributions. For example, the MoVi dataset employs a Noitom Neuron Edition V2 suit with 18 nine-axis sensors (3-axis gyroscope, 3-axis accelerometer, 3-axis magnetometer) at 120 Hz (Ghorbani et al., 2020), whereas many contemporary multi-modal datasets (e.g., RELI11D (Yan et al., 28 Mar 2024), AscendMotion (Yan et al., 27 Mar 2025), MINIONS (Chen et al., 23 Jul 2024), and EMHI (Fan et al., 30 Aug 2024)) use Xsens MVN or similar commercial suits with 17 IMUs at 60 Hz, where each sensor outputs orientation and acceleration consistent with the sensor’s local frame.

Dataset architectures may also involve custom configurations: for instance, in (Gupta et al., 2023), marker-equipped pens with dual Arduino Nano 33 BLE Sense (LSM9DS1) or MPU9250 are used for handwriting acquisition, capturing accelerometer, gyroscope, and magnetometer data at 50 Hz. Sensor configurations may target specific anatomic regions (e.g., single-leg attachment for gait (Santos et al., 2021)) or object locations (e.g., object-centered IMUs in IMHD² (Zhao et al., 2023) and helmet-mounted IMUs in HelmetPoser (Li et al., 8 Sep 2024)).

Data streams include:

Raw accelerometer/gyroscope/magnetometer time series (3 to 9 channels per sensor)
Derived signals: sensor-displacement, joint velocities, quaternions, and BVH files (for standard MoCap applications)
Cross-modal annotation: time-aligned video, LiDAR, event streams, or RGB-D point clouds

Sampling frequencies range from 30–400 Hz, matching or exceeding complementary vision-based capture; for example, TUM-VIE (Klenk et al., 2021) records IMU data at 200 Hz, with Allan deviation-based characterization of bias and white noise.

2. Experimental Protocols and Synchronization Methodologies

Robust synchronization among IMU, vision, and other sensor streams is critical for data utility and fusion. Several strategies are adopted:

Hardware triggering: Cameras and IMUs are aligned through shared triggers or TTL pulses, as in TUM-VIE (Klenk et al., 2021) and LuViRA (Yaman et al., 2023), achieving sub-millisecond offsets.
Manual-event anchor points: MoVi (Ghorbani et al., 2020) aligns IMU streams with optical MoCap streams by maximizing cross-correlation of keypoint trajectories in the z-dimension:

$\max_{\alpha, \beta} \ \max_{\tau} \Big( v_z^j(t) \star \tilde{v}_z^j(\alpha t + \beta) \Big)(\tau)$

where $v_z^j(t)$ and $\tilde{v}_z^j$ are optical and IMU z-coordinates for joint $j$ .

Piecewise matching via characteristic events: RELI11D (Yan et al., 28 Mar 2024) and EMHI (Fan et al., 30 Aug 2024) use highly detectable motions (e.g., jumps, taps) as anchor points to facilitate automated or manual temporal registration.
NTP/network-based time correction: Cross-modality synchronization in LuViRA (Yaman et al., 2023) relies on periodic NTP server updates and synchronization pulses from mocap systems.

Temporal misalignments are corrected post hoc using interpolation and by modeling variable time offsets, as seen in MoCap2GT’s cubic B-spline time correction (Shu et al., 17 Jul 2025).

Spatial registration of IMU frames to mocap or world coordinates is achieved via hand–eye calibration, often posed as an orthogonal Procrustes problem:

$\arg\min_{R} \sum_{i} \| R A_i - B_i \|_F$

where $A_i$ and $B_i$ are matched rotation measurements from different frames (Li et al., 8 Sep 2024).

3. Data Processing, Annotation Pipelines, and Fusion

IMU data is processed through a combination of:

Low-pass filtering (e.g., Butterworth, with 5 Hz cut-off (Santos et al., 2021)) to reduce sensor noise
Segmentation and preprocessing (standardizing trial lengths, scaling, outlier handling) (Gupta et al., 2023)
Kinematic annotation: Joint positions/rotations are computed by integrating IMU signals, using inverse kinematics or sensor-fusion models
Sensor-fusion pipelines: Multimodal data are fused using optimization frameworks that employ maximum likelihood estimation over factor graphs (MoCap2GT (Shu et al., 17 Jul 2025)), cross-attention (RELI11D/LEIR (Yan et al., 28 Mar 2024)), or loss-augmented training (MINIONS, with SMPL parameterization (Chen et al., 23 Jul 2024)).

Notably, MoVi provides both raw and processed data (“calculation files” with multiple signal types) and standard motion files (BVH). Puppet models such as SMPL, with

$E_{3d} = \sum w_{3d} \cdot \|\mathcal{J} M(\theta, \beta, t) - P_{3d}\|^2_2$

are employed for mesh fitting and annotation (Chen et al., 23 Jul 2024).

Recent work uses diffusion models and transformer-based architectures for robust pose estimation and calibration under noise, loose attachment, and dynamic motion (see (Zuo et al., 12 Jun 2025, Ilic et al., 18 Jun 2025)).

4. Application Domains and Research Use Cases

IMU motion capture datasets underpin a broad spectrum of research and industrial applications:

Human pose estimation and tracking in occlusion-prone, out-of-lab, or markerless environments
Gait analysis and biomechanics (direct comparison of IMU and optical trajectories (Santos et al., 2021))
Human action recognition, user authentication (Motion ID (Gavron et al., 2023)), and biometric systems in daily life
Sensor fusion and calibration benchmarking in SLAM and extended reality (MoCap2GT (Shu et al., 17 Jul 2025), LuViRA (Yaman et al., 2023))
Data-driven machine learning for fall detection utilizing synthetic data (Opensim pipeline (Tang et al., 2023))
Handwriting and gesture recognition, informed by fine-grained IMU recordings (Gupta et al., 2023)
VR/AR motion tracking (EMHI (Fan et al., 30 Aug 2024)), especially where visual self-occlusions or lighting variability occur

A particularly important use is the benchmarking of imputation and denoising techniques for missing or corrupted IMU signals (MoCap-Impute (Bekhit et al., 14 Jul 2025)); techniques compared range from KNN and matrix factorization to GAIN and contextual multivariate imputation.

5. Calibration, Error Analysis, and Data Quality Considerations

Calibration and error mitigation are critical for useful IMU motion capture datasets:

Bias, white noise, and drift are characterized by Allan deviation analysis (TUM-VIE (Klenk et al., 2021)) or real-time correction via LSTM/Transformer-based models (HelmetPoser (Li et al., 8 Sep 2024)).
Robust calibration pipelines (MoCap2GT (Shu et al., 17 Jul 2025), Transformer IMU Calibrator (Zuo et al., 12 Jun 2025)) address both static and dynamic drift, measurement offsets (R_{G′G} and R_{BS}), and enable real-time or implicit calibration for long-duration or sparse-IMU configurations.
Degeneracy-aware rejection: Poorly excited (low-rotation) segments are flagged to prevent degeneracy in joint optimization (MoCap2GT).
Ground-truth verification: High-precision optical MoCap systems (e.g., VICON, OptiTrack, 0.5 mm accuracy in LuViRA) provide a “gold standard” for validating IMU-based pose estimates.

Datasets often provide calibration sequences and noise characterization to support robust downstream algorithm development.

6. Dataset Accessibility, Structure, and Benchmarking

Many IMU motion capture datasets are made publicly available, with organized repositories that include:

Raw and processed data files (CSV, BVH, ROSbag, time-synchronized multi-modal formats)
Calibration and metadata (sensor placement, subject identifiers, demographics, ground-truth alignment)
Code and protocols for pre-processing, calibration, and benchmarking (e.g., MoCap-Impute code for imputation benchmarking (Bekhit et al., 14 Jul 2025), MoCap2GT code for precise SLAM ground truth estimation (Shu et al., 17 Jul 2025))
Rich annotation: SMPL parameterization, 2D/3D joint locations, action labels, and environmental context

Increasingly, datasets are designed to stress-test new scenarios (e.g., loose clothing perturbations (Ilic et al., 18 Jun 2025), climbing motions in non-planar environments (Yan et al., 27 Mar 2025), egocentric VR movement (Fan et al., 30 Aug 2024)), and often include companion code, baseline models, and evaluation metrics (e.g., MPJPE, μ_glo, jitter, error reductions by algorithm).

7. Open Challenges and Future Directions

Despite advances, outstanding challenges remain:

Realistic simulation of loose-wear and garment interaction effects, with garment-aware models that factor in anthropometrics and clothing style parameters (Ilic et al., 18 Jun 2025)
Implicit real-time calibration of sparse and dynamically attached IMUs (Zuo et al., 12 Jun 2025)
Temporal and spatial misalignment management, especially for ad hoc or consumer-grade sensor configurations
Missing data restoration and robust imputation under structured or transition-dependent loss (Bekhit et al., 14 Jul 2025)
Scaling annotations and calibration to multi-person, naturalistic, and cross-device environments

Future research will likely focus on unsupervised domain adaptation, improved sensor fusion through multi-modal cross-attention, in-the-wild deployments, and large-scale, semi-supervised annotation pipelines leveraging pseudo-labels and teacher-student models (Yan et al., 27 Mar 2025). Integration of precise, calibrated IMU datasets into standardized frameworks is expected to further advance the performance, accuracy, and realism of human and robot motion understanding.