MoVi: A Large Multipurpose Motion and Video Dataset (2003.01888v1)

Published 4 Mar 2020 in cs.CV, cs.LG, and eess.IV

Abstract: Human movements are both an area of intense study and the basis of many applications such as character animation. For many applications, it is crucial to identify movements from videos or analyze datasets of movements. Here we introduce a new human Motion and Video dataset MoVi, which we make available publicly. It contains 60 female and 30 male actors performing a collection of 20 predefined everyday actions and sports movements, and one self-chosen movement. In five capture rounds, the same actors and movements were recorded using different hardware systems, including an optical motion capture system, video cameras, and inertial measurement units (IMU). For some of the capture rounds, the actors were recorded when wearing natural clothing, for the other rounds they wore minimal clothing. In total, our dataset contains 9 hours of motion capture data, 17 hours of video data from 4 different points of view (including one hand-held camera), and 6.6 hours of IMU data. In this paper, we describe how the dataset was collected and post-processed; We present state-of-the-art estimates of skeletal motions and full-body shape deformations associated with skeletal motion. We discuss examples for potential studies this dataset could enable.

Authors (7)

Saeed Ghorbani (9 papers)
Kimia Mahdaviani (1 paper)
Anne Thaler (1 paper)
Konrad Kording (21 papers)
Douglas James Cook (1 paper)
Gunnar Blohm (6 papers)
Nikolaus F. Troje (7 papers)

Citations (62)

View on Semantic Scholar

Summary

The paper introduces MoVi, a large multipurpose dataset designed for computer vision and human motion analysis, combining synchronized optical motion capture, video, and IMU data.
MoVi includes data from 90 actors performing 20 predefined actions under varied conditions, collected in five rounds with different hardware systems totaling hours of multimodal recordings.
The dataset enables research in areas like 2D/3D pose estimation, action recognition, and body shape reconstruction using pipelines like V3D and MoSh++, supporting applications in various fields.

Overview of MoVi: A Large Multipurpose Motion and Video Dataset

The paper "MoVi: A Large Multipurpose Motion and Video Dataset" introduces MoVi, an extensive dataset designed to facilitate a wide range of applications in computer vision and human motion analysis. The dataset comprises synchronized recordings from multiple modalities including optical motion capture, video, and inertial measurement units (IMUs). With contributions from 90 actors performing a variety of predefined and self-chosen actions, MoVi stands out for its breadth and diversity.

Dataset Composition and Collection Methodology

MoVi's data collection involved 60 female and 30 male actors performing 20 predefined actions along with one self-chosen movement, under varied conditions pertaining to clothing and recording technology. Data was captured in five discrete rounds using different combinations of hardware systems, including optical motion capture systems for skeletal motion, video cameras for visual data, and IMUs for dynamic and spatial data acquisition.

The collection employed a combination of tight and loose clothing to capture variations in motion dynamics, while featuring multiple viewpoints and capture systems to enhance the dataset's applicability. The dataset encompasses 9 hours of motion capture data, 17 hours of video data from four viewpoints, and 6.6 hours of IMU data, combining modalities that encourage the development and benchmarking of robust models for motion analysis.

Technical Implementation and Output

The paper discusses two pipelines used to derive skeletal and body shape data — V3D and MoSh++. Both provide skeletal pose estimates but through differing methodologies: V3D employs a biomechanics formulation, while MoSh++ uses a regression model fitted to the SMPL body model to additionally infer soft-tissue dynamics. These outputs enable multiple research avenues, from human pose estimation and action recognition to body shape reconstruction and gait analysis.

For technical rigor, the calibration of devices and synchronization across modalities is meticulously described, leveraging cross-correlation techniques for aligning temporal dimensions between video and motion capture data.

Implications and Future Directions

MoVi lays a solid foundation for advancing computational models that tackle various challenges in understanding human motion. Its extensive range of actions, accuracy in capturing dynamic body shapes, and synchronization of multimodal data encourage the development of algorithms that can operate effectively across different real-world scenarios. Researchers can utilize MoVi for improving 2D and 3D pose estimation, designing more accurate models for motion synthesis, and enabling precise action recognition algorithms.

The dataset's design foresees applications beyond traditional boundaries, potentially benefitting fields such as biomechanics for sports science, animation, surveillance, and augmented reality, where precise motion metrics are critical. Future research could explore the fusion of data from the MoVi dataset with other complementary data sources to enrich models further, leading to innovations in interactive media and personalized health assessments.

In conclusion, MoVi stands as a commendable resource in the human motion research space, addressing past limitations of specificity and size in available datasets. Its release invites researchers to explore new methodologies and improve existing paradigms across a multitude of applications in biological motion analysis and computer vision.

Related Papers

YouTube

Show All Videos