Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Deep Analysis of CNN-based Spatio-temporal Representations for Action Recognition (2010.11757v4)

Published 22 Oct 2020 in cs.CV

Abstract: In recent years, a number of approaches based on 2D or 3D convolutional neural networks (CNN) have emerged for video action recognition, achieving state-of-the-art results on several large-scale benchmark datasets. In this paper, we carry out in-depth comparative analysis to better understand the differences between these approaches and the progress made by them. To this end, we develop an unified framework for both 2D-CNN and 3D-CNN action models, which enables us to remove bells and whistles and provides a common ground for fair comparison. We then conduct an effort towards a large-scale analysis involving over 300 action recognition models. Our comprehensive analysis reveals that a) a significant leap is made in efficiency for action recognition, but not in accuracy; b) 2D-CNN and 3D-CNN models behave similarly in terms of spatio-temporal representation abilities and transferability. Our codes are available at https://github.com/IBM/action-recognition-pytorch.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Chun-Fu Chen (28 papers)
  2. Rameswar Panda (79 papers)
  3. Kandan Ramakrishnan (8 papers)
  4. Rogerio Feris (105 papers)
  5. John Cohn (4 papers)
  6. Aude Oliva (42 papers)
  7. Quanfu Fan (22 papers)
Citations (91)

Summary

We haven't generated a summary for this paper yet.