Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PKU-MMD: A Large Scale Benchmark for Continuous Multi-Modal Human Action Understanding (1703.07475v2)

Published 22 Mar 2017 in cs.CV

Abstract: Despite the fact that many 3D human activity benchmarks being proposed, most existing action datasets focus on the action recognition tasks for the segmented videos. There is a lack of standard large-scale benchmarks, especially for current popular data-hungry deep learning based methods. In this paper, we introduce a new large scale benchmark (PKU-MMD) for continuous multi-modality 3D human action understanding and cover a wide range of complex human activities with well annotated information. PKU-MMD contains 1076 long video sequences in 51 action categories, performed by 66 subjects in three camera views. It contains almost 20,000 action instances and 5.4 million frames in total. Our dataset also provides multi-modality data sources, including RGB, depth, Infrared Radiation and Skeleton. With different modalities, we conduct extensive experiments on our dataset in terms of two scenarios and evaluate different methods by various metrics, including a new proposed evaluation protocol 2D-AP. We believe this large-scale dataset will benefit future researches on action detection for the community.

Citations (220)

Summary

  • The paper introduces PKU-MMD, a comprehensive dataset containing over 5.4M frames and 20K action instances across 51 categories for continuous multi-modal 3D action understanding.
  • It leverages diverse modalities including RGB, depth, IR, and skeleton data captured from multiple viewpoints, enhancing experimental versatility.
  • It proposes novel evaluation metrics like 2D-AP and uses cross-subject and cross-view protocols to challenge and improve current detection methods.

Overview of "PKU-MMD: A Large Scale Benchmark for Continuous Multi-Modal Human Action Understanding"

The paper introduces the PKU-MMD dataset, a large-scale benchmark specifically designed for continuous multi-modality 3D human action recognition and detection. This dataset represents a significant effort to address existing limitations in the field of human action understanding, particularly for action detection tasks. It offers extensive data for deep learning-based methods that require large-scale and diverse datasets. The PKU-MMD dataset stands out due to its comprehensive collection of modalities and the volume of data provided.

Key Features of PKU-MMD

PKU-MMD comprises 1076 long video sequences featuring 51 unique action categories executed by 66 subjects, captured from three different camera viewpoints. This dataset spans approximately 3000 minutes, containing over 20,000 action instances across more than 5.4 million frames. The wide variety of modalities it includes—RGB, Depth, Infrared Radiation (IR), and Skeleton—enhances its utility for diverse analytical tasks.

The structure of PKU-MMD facilitates extensive experimentation by allowing researchers to work with different modalities either independently or in combination. This feature is critical for developing and evaluating algorithms that need to leverage various sensory inputs for robust human action understanding.

Experimental Protocols and Results

The paper introduces new evaluation metrics, including a 2D Average Precision (2D-AP) which incorporates detection confidence and overlapping ratios, providing a comprehensive evaluation criterion for action detection algorithms. Two main partition settings, cross-subject and cross-view, are used to ensure the robustness of model evaluations against subject variations and viewpoint changes.

Multiple detection frameworks were tested on the PKU-MMD dataset, involving sliding-window approaches and feature extraction from raw skeletal, RGB, and optical flow data. The experimental results demonstrate the dataset's challenge to existing methods, revealing the complexity of continuous multi-modal 3D action detection. Although some conventional deep learning approaches showed moderate success, the results highlight the need for more advanced models to achieve higher accuracy in such a comprehensive setting.

Implications and Future Directions

The PKU-MMD dataset fills a crucial gap by providing a large-scale and richly annotated dataset for continuous action detection and recognition in 3D spaces. It serves as a valuable resource for developing algorithms capable of real-time understanding of complex human actions from varied sensory inputs.

As the field of multimodal action understanding evolves, future research might explore innovative neural architectures capable of better integrating multiple modalities. Furthermore, improved models that can handle intra-class variability and provide accurate temporal localization are imperative. Models trained on PKU-MMD may contribute significantly to advancements in human-computer interaction, surveillance, and assistive technologies.

In summary, this dataset is set to foster significant progress in continuous multi-modal human activity analysis and serve as a pivotal benchmark for future AI developments in this area.