Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 72 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 30 tok/s Pro
GPT-5 High 24 tok/s Pro
GPT-4o 115 tok/s Pro
Kimi K2 203 tok/s Pro
GPT OSS 120B 451 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

MM-Fi Dataset: Multi-Modal Sensing Benchmark

Updated 4 October 2025
  • MM-Fi Dataset is a large-scale multi-modal human sensing resource integrating five sensor modalities (RGB-D, LiDAR, mmWave radar, and WiFi-CSI) for device-free human analysis.
  • It provides rich annotations including 2D/3D pose keypoints, dense pose maps, and temporal action segmentation across 27 daily and rehabilitation activities.
  • The dataset enables advanced research in multi-modal data fusion and cross-modal supervision, setting benchmarks for human pose estimation and activity recognition.

The MM-Fi Dataset denotes a publicly available, large-scale, multi-modal human sensing resource specifically constructed to enable privacy-preserving, device-free, and versatile human sensing. Its design supports research in action recognition, 4D human pose estimation, multi-modal learning, cross-modal supervision, and interdisciplinary healthcare applications. MM-Fi integrates synchronized data streams from five distinct sensor modalities—RGB-D camera, LiDAR, mmWave radar, and WiFi Channel State Information (CSI)—and incorporates rich annotations such as precise 2D/3D pose keypoints, dense pose maps, and temporal action segmentation across 27 daily and rehabilitation-oriented actions from 40 subjects.

1. Dataset Composition and Sensing Modalities

MM-Fi contains more than 320,000 temporally-aligned frames of human activities recorded from 40 individuals in four distinct environments. Collection was performed using a mobile sensor cart orchestrated via Robot Operating System (ROS) to enable sub-millisecond synchronization of:

  • RGB Images (Intel RealSense D435, high-resolution)
  • Depth Maps (same RGB-D source, dense per-pixel depth)
  • LiDAR Point Clouds (Ouster OS1, 32-channel, dense 3D spatial)
  • mmWave Radar Point Clouds (Texas Instruments IWR6843; points represented as Pm=(x,y,z,d,I)P_m = (x, y, z, d, I) where dd is Doppler velocity, II signal intensity; frame aggregation mitigates temporal sparsity)
  • WiFi-CSI Data (TP-Link N750, modified firmware; each CSI snapshot is a 3×114×103 \times 114 \times 10 tensor per 100 ms)

Data is provided in various standard formats (e.g., NumPy, MATLAB, binary point clouds) and is accessible via a PyTorch dataloader and toolbox. The dataset includes multi-level annotations: 2D/3D pose keypoints generated via HRNet-w48 applied to dual IR views, 3D position cuboids (LiDAR+camera fusion, error < 50 mm), semantic dense mapping, and temporally precise action labels.

2. Action Categories, Annotation, and Task Support

MM-Fi comprises 27 human activity categories, encompassing 14 daily life actions (e.g., chest expansion, arm waving, object picking, kicking) and 13 clinically-inspired rehabilitation exercises (e.g., squats, lunges, limb extensions, jumping). This breadth facilitates research on both routine context-aware sensing and domain-specific healthcare scenarios.

Annotations support:

  • 2D/3D Keypoint Pose via multi-view triangulation and regularized optimization.
    • The objective function for pose triangulation is:

    L=LG+λ0LA\mathcal{L} = \mathcal{L}_G + \lambda_0 \mathcal{L}_A

    LG\mathcal{L}_G combines multi-view reprojection error, temporal smoothness, and bone-length constraints; LA\mathcal{L}_A adds action regularization for ambiguous poses (e.g., joint spatial constraints, action domain priors).

  • 3D Dense Pose Mapping via advanced RGB-based estimation.

  • Temporal Segmentation precisely marking start/end of activities for downstream segmentation and recognition tasks.

3. Technical Aspects of Multi-Modal Data Fusion

Extensive benchmarks demonstrate single and multi-modal fusion capacities:

  • Pose Estimation Metrics:

    • Mean Per Joint Position Error (MPJPE)
    • Procrustes Aligned MPJPE (PA-MPJPE)
  • Data Split Strategies: Random frames (S1); cross-subject (S2); cross-environment (S3).

LiDAR and mmWave radar modalities exhibit robust single-modality performance (low PA-MPJPE), while WiFi-CSI, noting its lower spatial resolution, benefits from cross-modal supervision and fusion. Weighted least mean squares fusion of modalities yields gains in pose estimation accuracy, establishing multi-modal fusion as essential to leveraging the strengths and mitigating the weaknesses inherent across sensor domains.

4. Applications in Wireless Sensing and Healthcare

MM-Fi is designed to facilitate:

  • Human Pose Estimation: Explicit benchmarks are provided for multi-modal pose regression; loss terms, data splits, and results across modalities (including fusion).
  • Action Recognition: Skeleton-based activity categorization is benchmarked via graph convolutional architectures (AGCN, CTRGCN). The breadth of actions, especially rehabilitation, makes MM-Fi directly applicable for healthcare analytics, remote patient monitoring, and quantitative assessment of rehabilitation adherence/effectiveness.
  • Smart Home Automation, Avatar Simulation, and Human-Computer Interaction: Synchronized and annotated modalities permit privacy-preserving context-aware systems for ubiquitous computing.
  • Multi-Modal and Cross-Modal Learning: MM-Fi supports exploration of modality transfer, cross-modal supervision, and multi-modal representation learning.

5. Impact on Model Design and Standardization

MM-Fi provides structure and diversity critical for developing robust models:

  • WiDistill (Wang et al., 5 Oct 2024): Demonstrates efficient dataset distillation, reducing the substantial MM-Fi data volume (>7.5GB, 320k+ samples) into high-fidelity, synthetic sets that preserve the training dynamics of activity recognition models via trajectory matching. Empirical results show competitive recognition accuracy even at drastically reduced sample sizes.
  • X-Fi (Chen et al., 14 Oct 2024): Serves as a foundation for modality-invariant models in human sensing. MM-Fi's diverse modalities and semantic richness enable the development of transformer-based fusion mechanisms (X-fusion), which preserve modality-specific features, adaptively integrate them, and deliver state-of-the-art results in HPE and HAR benchmarks.
  • Standardization: MM-Fi’s comprehensive documentation, multi-modality, and annotation approach highlights the need for standardized benchmarks in multi-modal sensing, influencing ongoing efforts such as IEEE 802.11bf and framing requirements for privacy, replicability, and cross-domain usability.

6. Data Accessibility, Utility, and Limitations

MM-Fi is publicly distributed with structured folders, standard formats, and toolkit support. Its relevance is underlined by enabling replicable experiments, interoperability with leading model architectures, and facilitating comparative analysis across learning paradigms and environments.

Challenges include:

  • Modality-Specific Constraints: CSI’s low spatial resolution, radar’s angular/attire sensitivity, and LiDAR/camera occlusions demand sophisticated fusion, outlier removal, and learning approaches.
  • Generalizability: Not all modalities excel across all tasks (WiFi-CSI is excluded from HAR due to brief time windows); cross-environment, cross-subject variation remains an open research problem.

A plausible implication is that as modality diversity expands, multi-modal learning frameworks and synthetic data distillation will become essential for scalable human sensing solutions in both academic and applied domains. The MM-Fi Dataset thus establishes a benchmark for future research in multi-modal wireless sensing and 4D human perception.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to MM-Fi Dataset.