Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Temporal Modeling Approaches for Large-scale Youtube-8M Video Understanding (1707.04555v1)

Published 14 Jul 2017 in cs.CV

Abstract: This paper describes our solution for the video recognition task of the Google Cloud and YouTube-8M Video Understanding Challenge that ranked the 3rd place. Because the challenge provides pre-extracted visual and audio features instead of the raw videos, we mainly investigate various temporal modeling approaches to aggregate the frame-level features for multi-label video recognition. Our system contains three major components: two-stream sequence model, fast-forward sequence model and temporal residual neural networks. Experiment results on the challenging Youtube-8M dataset demonstrate that our proposed temporal modeling approaches can significantly improve existing temporal modeling approaches in the large-scale video recognition tasks. To be noted, our fast-forward LSTM with a depth of 7 layers achieves 82.75% in term of GAP@20 on the Kaggle Public test set.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Fu Li (86 papers)
  2. Chuang Gan (196 papers)
  3. Xiao Liu (402 papers)
  4. Yunlong Bian (2 papers)
  5. Xiang Long (29 papers)
  6. Yandong Li (38 papers)
  7. Zhichao Li (31 papers)
  8. Jie Zhou (688 papers)
  9. Shilei Wen (42 papers)
Citations (60)

Summary

We haven't generated a summary for this paper yet.