Multi-shape Inertial MoCap Dataset (MID)

Updated 21 March 2026

MID is a comprehensive MoCap dataset featuring diverse body shapes along with synchronized inertial sensor data and SMPL-based annotations.
It employs high-fidelity multi-node IMU systems and rigorous calibration protocols to ensure precise, reproducible motion and shape recordings.
The dataset facilitates robust algorithm development for pose and shape estimation, enhancing applications in biomechanics and computer vision.

A Multi-shape Inertial MoCap Dataset (MID) refers to a dataset for human motion capture (MoCap) that includes subjects exhibiting significant variation in body shape, with synchronized inertial measurement unit (IMU) data and accompanying annotations supporting full-body motion and shape analysis. Unlike legacy datasets which often presuppose a single adult or template body shape, MID datasets enable the development of motion reconstruction algorithms that generalize across the anthropometric spectrum, including children and adults, as well as a range of body mass distributions, limb proportions, and soft-tissue variations. MID datasets are essential for addressing the IMU-pose-shape correlation shifts that arise due to structural and inertial property differences between individuals.

1. Subject and Shape Diversity

MID datasets are characterized by comprehensive representation of body shape variability via curated participant demographics and explicit body shape parameterization. For example, the MID dataset introduced alongside the Shape-aware Inertial Poser (SAIP) comprises 20 individuals (10 children aged 5–10 years and 10 adults aged 18 years and above), spanning heights from 118 cm to 191 cm, and controlled for age, gender, and anthropometric profiles. Shape parameterization utilizes the SMPL (Skinned Multi-Person Linear) model coefficients $\beta \in \mathbb{R}^{10}$ , encoding per-subject body-fat distribution, limb girth, and bone-length proportions. Initial shape vectors $\beta_0$ are scaled by the height ratio $H_R/H_T$ to preserve proportional fidelity for each subject (Yin et al., 20 Oct 2025).

The MoVi dataset, another example of an MID resource, extends shape diversity with 90 subjects (60 female, 30 male) for broader statistical coverage. Here, SMPL shape is encoded with $\beta \in \mathbb{R}^{16}$ coefficients per subject, supporting studies on pose-independent body-shape variation and dynamic body mesh deformations (Ghorbani et al., 2020).

2. IMU Hardware Configuration and Calibration

MID datasets employ high-fidelity, multi-node IMU systems. In the SAIP MID dataset, Noitom Perception Neuron (PN) IMUs are used with 17 sensor placements for comprehensive capture, and a six-IMU subset for sparse MoCap benchmarks (head, wrists, ankles, pelvis). Measurements include orientation ( $R \in SO(3)$ ), linear acceleration ( $A_R \in \mathbb{R}^3$ ), and angular velocity ( $\omega \in \mathbb{R}^3$ ), sampled at 60 Hz and mapped into the SMPL coordinate system. Subjects undergo manual bone-length calibration, and sensor-to-body alignment is enforced with a T-pose protocol. Strict synchronization with optical-inertial fusion outputs ensures timestamp integrity (Yin et al., 20 Oct 2025).

The MoVi dataset employs the Noitom Neuron Edition V2 suit: 18 IMUs at head, torso, limbs, and extremities, yielding 3-axis gyroscope, accelerometer, and magnetometer data at 120 Hz. Synchronization to the optical MoCap system is achieved by maximizing the cross-correlation of vertical ankle trajectories between MoCap and IMU coordinates, supporting multi-modal, multi-system alignment (Ghorbani et al., 2020).

3. Data Acquisition Protocols

MID datasets follow standardized acquisition setups to ensure reproducibility and utility for algorithm development. The SAIP MID protocol captures freestyle and structured activities (e.g., sports, walking/running, arm swings, trunk motion), encouraging natural movement, particularly for children. Each subject records approximately 20 minutes (varied by compliance), resulting in a total of 400 minutes of paired IMU-Motion samples. The "full" 17-IMU system serves as the ground-truth via optical-inertial fusion in PN Studio, generating BVH/FBX joint rotations and root trajectories; raw IMU CSVs (all 17 sensors) are exported in parallel, with sparse selections for downstream evaluation. Data integrity is maintained by atomic acquisition and strict timestamp alignment across modalities (Yin et al., 20 Oct 2025).

The MoVi dataset spans five capture rounds per subject and movement: combinations of "full" optical marker sets, sparse markers plus IMU, IMU-only, as well as synchronized and unsynchronized multi-view video. Minimal clothing is used for marker accuracy and natural clothing for practical scenario emulation. Synchronized streams are achieved with hardware triggers (video-to-MoCap) and temporal alignment algorithms (IMU-to-MoCap). The breadth of recording modalities supports cross-domain pose and shape modeling as well as robust evaluation of algorithm generalization (Ghorbani et al., 2020).

4. Data Structure, File Formats, and Metadata

The organization and format of MID datasets are designed for algorithmic accessibility and extensibility. In the SAIP MID collection, data are structured under a root directory with subfolders for IMU CSVs (per subject), BVH/FBX skeletal motion outputs, and SMPL-aligned NumPy files for pose $\theta$ (as 6D representations, $\mathbb{R}^{T\times144}$ ), root translations, and ground truth shape ( $\beta_\text{gt} \in \mathbb{R}^{10}$ ). Metadata includes age, height, gender, initial shape vectors, and sensor offsets (Yin et al., 20 Oct 2025).

File/Directory	Content Description	Format
IMU_csv/	Raw IMU streams (17 sensors, select 6 for sparse)	.csv (60 Hz)
BVH/FBX/	Ground-truth joint rotations & trajectories	.bvh/.fbx
SMPL_aligned/	SMPL pose, root, shape annotations	.npy/.txt
metadata.csv	Subject demographics, shape, calibration offsets	.csv

A comparable structure is present in MoVi: .mat files for marker/mesh data, .bvh files for IMU-derived motion, per-view videos (.avi), and intrinsic/extrinsic calibration files. Annotations include 3D joint locations, dynamic and pose parameters, and per-frame visibility/occlusion flags (Ghorbani et al., 2020).

5. Shape and Motion Modeling Methodologies

MID datasets center the SMPL parametric body model for mesh-based synthesis and joint regression, with shape ( $\beta$ ) and pose ( $\theta$ ) coefficients disentangling anthropometrics from articulation. Full-body mesh vertices are generated as $V = W(T(\beta, \theta), J(\beta), \theta, W)$ , with $T(\beta, \theta) = \bar{T} + B_S(\beta) + B_P(\theta)$ (where $B_S$ and $B_P$ are respective blend-shape bases). This parameterization enables subject-specific mesh creation and retargeting (Yin et al., 20 Oct 2025, Ghorbani et al., 2020).

Further, the SAIP pipeline introduces neural retargeting for shape-aware IMU processing:

$R_\text{acc}: \mathbb{R}^3 \times \mathbb{R}^d \rightarrow \mathbb{R}^3$ maps subject-specific accelerations to template-scale equivalents.
$R_\text{vel}: \mathbb{R}^3 \times \mathbb{R}^d \rightarrow \mathbb{R}^3$ for joint velocities.

Shape estimation employs an MLP $\varphi_\text{shape}$ over a windowed sequence ( $W=60$ frames), jointly processing historical accelerations, poses, and subject height to yield $\hat{\beta} \in \mathbb{R}^{10}$ (Yin et al., 20 Oct 2025).

MoVi supports mesh and joint extraction via MoSh++ (AMASS pipeline), yielding per-frame optimized $\beta$ , $\theta$ , and soft-tissue dynamics, as well as biomechanical joint-center extraction via V3D (Ghorbani et al., 2020).

6. Loss Functions, Optimization, and Evaluation

Shape-aware learning with MID datasets leverages targeted loss functions corresponding to IMU retargeting, pose estimation, and shape inference:

Acceleration: $\mathcal{L}_\text{acc} = \|R_\text{acc}(A_R, \beta) - A_T\|_2^2$
Velocity: $\mathcal{L}_\text{vel} = \|R_\text{vel}(V_T, \beta) - V_R\|_2^2$
Shape regression: $\mathcal{L}_\text{shape} = \|\varphi_\text{shape}(A_R, \theta, H_R) - \beta_\text{gt}\|_2^2$
Pose and motion: $\mathcal{L}_\text{pose} = \sum_{t,j}\|\hat{\theta}_{t,j} - \theta_{t,j}\|_2^2$ , $\mathcal{L}_\text{trans} = \sum_t \|\hat{r}_{\text{root}, t} - r_{\text{root}, t}\|_2^2$ , contact classification via binary cross-entropy.
Shape-aware physics optimization minimizes deviation from target pose/velocity under subject-specific dynamics, solving for joint torques $\tau$ with constraints derived from $\beta$ -dependent masses and inertiae (Yin et al., 20 Oct 2025).

MoVi provides ground-truth for cross-modal benchmarking. Use cases include pose estimation from video or IMU, body-shape reconstruction from images, and biomechanical or machine learning-based motion synthesis (Ghorbani et al., 2020).

7. Accessibility and Usage Scenarios

MID datasets include support code and documentation for data loading, stream alignment, visualization, and camera projection. The SAIP MID dataset and codebase are publicly available (Yin et al., 20 Oct 2025). MoVi is accessible via Dataverse and AMASS, with example Jupyter notebooks for 3D mesh/video overlay, IMU parsing, and camera model loading. These resources facilitate reproducibility and rapid experimentation in human pose, shape estimation, and motion analysis domains (Ghorbani et al., 2020).

A plausible implication is that the emergence of MID datasets directly addresses limitations posed by fixed-shape or homogeneous-body datasets, enabling robust algorithm development across subpopulations and supporting the next generation of shape-aware motion capture, biomechanics, and computer vision research.

Markdown Report Issue Upgrade to Chat

References (2)

Shape-aware Inertial Poser: Motion Tracking for Humans with Diverse Shapes Using Sparse Inertial Sensors (2025)

MoVi: A Large Multipurpose Motion and Video Dataset (2020)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Multi-shape Inertial MoCap Dataset (MID).

Multi-shape Inertial MoCap Dataset (MID)

1. Subject and Shape Diversity

2. IMU Hardware Configuration and Calibration

3. Data Acquisition Protocols

4. Data Structure, File Formats, and Metadata

5. Shape and Motion Modeling Methodologies

6. Loss Functions, Optimization, and Evaluation

7. Accessibility and Usage Scenarios

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Multi-shape Inertial MoCap Dataset (MID)

1. Subject and Shape Diversity

2. IMU Hardware Configuration and Calibration

3. Data Acquisition Protocols

4. Data Structure, File Formats, and Metadata

5. Shape and Motion Modeling Methodologies

6. Loss Functions, Optimization, and Evaluation

7. Accessibility and Usage Scenarios

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research