Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Hierarchical Attention Network for Action Recognition in Videos (1607.06416v1)

Published 21 Jul 2016 in cs.CV

Abstract: Understanding human actions in wild videos is an important task with a broad range of applications. In this paper we propose a novel approach named Hierarchical Attention Network (HAN), which enables to incorporate static spatial information, short-term motion information and long-term video temporal structures for complex human action understanding. Compared to recent convolutional neural network based approaches, HAN has following advantages (1) HAN can efficiently capture video temporal structures in a longer range; (2) HAN is able to reveal temporal transitions between frame chunks with different time steps, i.e. it explicitly models the temporal transitions between frames as well as video segments and (3) with a multiple step spatial temporal attention mechanism, HAN automatically learns important regions in video frames and temporal segments in the video. The proposed model is trained and evaluated on the standard video action benchmarks, i.e., UCF-101 and HMDB-51, and it significantly outperforms the state-of-the arts

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yilin Wang (156 papers)
  2. Suhang Wang (118 papers)
  3. Jiliang Tang (204 papers)
  4. Neil O'Hare (2 papers)
  5. Yi Chang (150 papers)
  6. Baoxin Li (44 papers)
Citations (82)

Summary

We haven't generated a summary for this paper yet.