Papers
Topics
Authors
Recent
Search
2000 character limit reached

Human Action Recognition and Prediction: A Survey

Published 28 Jun 2018 in cs.CV | (1806.11230v3)

Abstract: Derived from rapid advances in computer vision and machine learning, video analysis tasks have been moving from inferring the present state to predicting the future state. Vision-based action recognition and prediction from videos are such tasks, where action recognition is to infer human actions (present state) based upon complete action executions, and action prediction to predict human actions (future state) based upon incomplete action executions. These two tasks have become particularly prevalent topics recently because of their explosively emerging real-world applications, such as visual surveillance, autonomous driving vehicle, entertainment, and video retrieval, etc. Many attempts have been devoted in the last a few decades in order to build a robust and effective framework for action recognition and prediction. In this paper, we survey the complete state-of-the-art techniques in action recognition and prediction. Existing models, popular algorithms, technical difficulties, popular action databases, evaluation protocols, and promising future directions are also provided with systematic discussions.

Citations (523)

Summary

  • The paper introduces TSNs which efficiently capture long-range temporal dynamics from video data for improved human action recognition.
  • It employs random temporal sampling and a two-stream architecture with pre-trained deep CNNs to reduce redundancy and enhance feature extraction.
  • The survey highlights adaptive sampling and advanced consensus functions, paving the way for real-time applications and further research.

Human Action Recognition and Prediction: A Survey

In the survey titled "Human Action Recognition and Prediction," Yu Kong and Yun Fu present a comprehensive analysis of various methodologies in the domain of human action recognition, with a focus on Temporal Segment Networks (TSNs). The paper outlines the challenges and advancements in extracting long-range temporal dynamics from video data, which is crucial for accurately recognizing complex human actions.

Temporal Segment Networks (TSNs)

The core method discussed involves TSNs, which utilize temporal sampling techniques to mitigate redundancy in video frames. Traditional approaches, often reliant on dense sampling, struggle with excessive computational demands and redundant data. In contrast, TSNs capitalize on randomly sampled frames from different segments of a video. This enables them to capture essential temporal dynamics efficiently, thereby addressing both short-term and long-term action characteristics.

In the original framework, TSNs employ a two-stream architecture derived from preceding research, leveraging pre-trained Deep CNNs for feature extraction. The consensus function—initially a simple pooling operation—is instrumental in summarizing predictions from individual segments.

Subsequent enhancements to the TSN framework, as noted in the paper, include adaptive sampling at varied temporal scales. This approach not only refines feature extraction but also replaces traditional pooling with fully connected networks, enhancing the encoding of frame sequences and improving recognition accuracy. These adaptations enable TSNs to be integrated into broader action recognition frameworks, showcasing flexibility and effectiveness.

Implications and Future Directions

This survey delineates the significant strides made in human action recognition through TSNs, underscoring their capability in handling complex, temporally extended actions. The discussion points toward several implications and avenues for further research:

  1. Practical Implications: The refined video analysis techniques promise significant improvements in performance for real-time applications, including surveillance, human-computer interaction, and video understanding.
  2. Theoretical Implications: Understanding and optimizing temporal dynamics fosters advances in sequential data modeling and neural network architectures.
  3. Future Research: The evolution of TSNs invites exploration into more sophisticated consensus functions and sampling methodologies. Additionally, integrating TSNs with emerging AI paradigms could yield considerable advancements in predictive accuracy and computational efficiency.

In conclusion, the survey by Kong and Fu provides an insightful overview of the current state and future potential of TSNs in human action recognition, reflecting both the depth and breadth of ongoing research in this dynamic field.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.