Multi-Person Extreme Motion Prediction (2105.08825v7)

Published 18 May 2021 in cs.CV

Abstract: Human motion prediction aims to forecast future poses given a sequence of past 3D skeletons. While this problem has recently received increasing attention, it has mostly been tackled for single humans in isolation. In this paper, we explore this problem when dealing with humans performing collaborative tasks, we seek to predict the future motion of two interacted persons given two sequences of their past skeletons. We propose a novel cross interaction attention mechanism that exploits historical information of both persons, and learns to predict cross dependencies between the two pose sequences. Since no dataset to train such interactive situations is available, we collected ExPI (Extreme Pose Interaction), a new lab-based person interaction dataset of professional dancers performing Lindy-hop dancing actions, which contains 115 sequences with 30K frames annotated with 3D body poses and shapes. We thoroughly evaluate our cross interaction network on ExPI and show that both in short- and long-term predictions, it consistently outperforms state-of-the-art methods for single-person motion prediction.

Abstract PDF Chat (Pro)

Citations (63)

View on Semantic Scholar

Summary

The paper presents a novel collaborative motion prediction framework that uses cross-interaction attention to simultaneously forecast the future motions of interacting individuals.
It introduces the ExPI dataset, comprising 115 sequences and 30,000 frames of professional dancer movements, to capture extreme and diverse interaction dynamics.
Experimental results show 10-40% short-term and 5-30% long-term prediction improvements, especially for limb joints, over state-of-the-art methods.

Multi-Person Extreme Motion Prediction

The paper "Multi-Person Extreme Motion Prediction" focuses on advancing human motion prediction technologies by addressing the complex scenario where multiple individuals interact in coordinated tasks, specifically focusing on extreme motion cases such as those performed by professional dancers. Traditionally, motion prediction has been explored extensively for isolated individuals. However, these approaches fail to account for the interpersonal dynamics prevalent in real-world interactions.

Collaborative Human Motion Prediction

This study introduces a novel problem formulation called collaborative motion prediction, aimed at simultaneously predicting the future motions of two individuals engaged in close interactions, by using sequences of their past skeletal motions. The authors propose a cross-interaction attention mechanism, which incorporates the history of both individuals to effectively anticipate cross-dependencies between their pose sequences.

Dataset Collection and Analysis

The authors address the scarcity of datasets catering to interactive human motion prediction by introducing the ExPI (Extreme Pose Interaction) dataset. It includes 115 sequences totaling 30,000 frames of professional Lindy Hop dancers, annotated with 3D body poses and shapes. Dancers perform various aerial movements requiring significant synchrony and cooperation, thus providing extreme cases ideal for studying collaborative interactions. Standard deviation metrics and diversity analyses demonstrate that ExPI poses are notably diverse and extreme compared to the existing Human3.6M dataset.

Methodology: Cross-Interaction Attention

The proposed method incorporates a Cross-Interaction Attention (XIA) module, contrasting standard motion prediction models which typically treat each pose sequence independently. XIA leverages historical motion data from both individuals simultaneously, using attention mechanisms to refine spatial-temporal information and guide predictions. This collaborative approach allows the model to learn more accurate representations of highly dynamic interactions, leading to improved motion predictions.

Experimental Protocol and Results

The evaluation includes three splits: common action, single action, and unseen actions. These splits examine performance under scenarios of varying familiarity to the trained model. The model consistently outperformed state-of-the-art methods in both short-term and long-term predictions, showing accuracy improvements of 10-40% and 5-30%, respectively. The improvement was most pronounced in limb joints, which are crucial for interaction, further underscoring the effectiveness of leveraging interpersonal dynamics.

Applications and Future Directions

The implications of this research span practical applications in areas such as sports analytics, virtual reality, and human-robot interaction, where understanding and predicting human motion in real-time is critical. Additionally, the theoretical contribution lies in expanding the understanding of motion dynamics with complex multi-agent systems. Future research could explore extending the dataset and refining models to handle more extensive and longer-term predictions, which remain challenging especially with high-speed motion sequences.

In conclusion, the paper establishes a new benchmark for multi-person motion prediction by addressing collaborative interactions in extreme motion scenarios, thereby contributing significantly to the field and creating pathways for further exploration in AI-driven motion dynamics.