Learning Trajectory Dependencies for Human Motion Prediction

Published 15 Aug 2019 in cs.CV | (1908.05436v3)

Abstract: Human motion prediction, i.e., forecasting future body poses given observed pose sequence, has typically been tackled with recurrent neural networks (RNNs). However, as evidenced by prior work, the resulted RNN models suffer from prediction errors accumulation, leading to undesired discontinuities in motion prediction. In this paper, we propose a simple feed-forward deep network for motion prediction, which takes into account both temporal smoothness and spatial dependencies among human body joints. In this context, we then propose to encode temporal information by working in trajectory space, instead of the traditionally-used pose space. This alleviates us from manually defining the range of temporal dependencies (or temporal convolutional filter size, as done in previous work). Moreover, spatial dependency of human pose is encoded by treating a human pose as a generic graph (rather than a human skeletal kinematic tree) formed by links between every pair of body joints. Instead of using a pre-defined graph structure, we design a new graph convolutional network to learn graph connectivity automatically. This allows the network to capture long range dependencies beyond that of human kinematic tree. We evaluate our approach on several standard benchmark datasets for motion prediction, including Human3.6M, the CMU motion capture dataset and 3DPW. Our experiments clearly demonstrate that the proposed approach achieves state of the art performance, and is applicable to both angle-based and position-based pose representations. The code is available at https://github.com/wei-mao-2019/LearnTrajDep

Abstract PDF Upgrade to Chat

Citations (411)

View on Semantic Scholar

Summary

The paper introduces a novel deep network that models temporal dependencies in trajectory space using DCT and spatial dependencies through learnable GCNs.
It overcomes RNN limitations such as error accumulation and discontinuities, enabling smoother and more accurate human motion predictions.
Experimental results on benchmarks like Human3.6M and 3DPW demonstrate state-of-the-art performance with significant implications for robotics and real-time interaction.

An Expert Overview of "Learning Trajectory Dependencies for Human Motion Prediction"

The paper "Learning Trajectory Dependencies for Human Motion Prediction" addresses the issue of predicting future human body poses given a sequence of observed poses. Traditionally, Recurrent Neural Networks (RNNs) have been used for this task; however, this paper highlights their limitations such as error accumulation and discontinuities in predictions. The authors propose an alternative approach using a feed-forward deep network that explicitly models the temporal smoothness and spatial dependencies of human body joints. This novel approach is articulated through several key innovations: the use of trajectory space for temporal encoding and graph convolutional networks (GCNs) for spatial encoding.

Temporal Encoding via Trajectory Space

The temporal information in human motion is typically encoded using past pose sequences directly. However, this paper shifts the paradigm by encoding temporal data in trajectory space using the Discrete Cosine Transform (DCT). This approach allows the model to automatically handle the temporal dependencies that traditional methods rely on manually defined convolutional filter sizes to encode. By employing trajectory space, the authors leverage the smoothness of human motion trajectories, making the model more efficient and avoiding the pitfalls of frame-by-frame predictions.

Spatial Encoding with Learnable Graph Convolutional Networks

The paper also departs from the conventional methods of modeling spatial dependencies which often use a pre-defined skeletal kinematic tree. Instead, it treats a human pose as a graph, where each joint is a node connected to every other joint. This is achieved through a novel GCN architecture that learns the graph connectivity automatically. This allows the model to capture complex dependencies beyond the anatomical structure, such as synchronizations between non-adjacent joints.

Experimental Results and Performance

The authors validate their approach on recognized benchmark datasets, including Human3.6M, CMU motion capture, and 3DPW, demonstrating state-of-the-art performance in human motion prediction. The results emphasize the model's capability to accurately predict motion sequences for both angle-based and position-based pose representations, overcoming the limitations of RNN-based models in capturing long-range and fine-grained dependencies.

Notably, the paper provides a comprehensive quantitative analysis, showing that the introduced method consistently outperforms previous approaches across various metrics and tasks. This performance edge is a direct consequence of the robust handling of both temporal and spatial dependencies through the novel approach proposed.

Implications and Future Directions

The implications of this research are substantial for applications requiring precise human motion prediction, such as in robotics, autonomous navigation, and human-computer interaction. The proposed method’s ability to learn and adapt its underlying model structure offers promising avenues for more generalized motion prediction frameworks that could handle a broader range of activities and environmental conditions.

For future developments, this approach may be extended by integrating contextual information or utilizing more sophisticated temporal interpolations. Additionally, exploring different graph learning techniques could further enhance the understanding of spatial dependencies. The potential to expand this methodology to real-time systems could significantly impact various domains relying on dynamic human-robot interactions.

In conclusion, "Learning Trajectory Dependencies for Human Motion Prediction" makes a notable contribution to the field by providing a more precise, adaptable approach to modeling human motion. By focusing on trajectory-based representation and flexible spatial dependency modeling, this paper opens new opportunities for advancing human motion prediction beyond the constraints of traditional methods.

Markdown