Temporal Modeling Approaches for Large-scale Youtube-8M Video Understanding (1707.04555v1)

Published 14 Jul 2017 in cs.CV

Abstract: This paper describes our solution for the video recognition task of the Google Cloud and YouTube-8M Video Understanding Challenge that ranked the 3rd place. Because the challenge provides pre-extracted visual and audio features instead of the raw videos, we mainly investigate various temporal modeling approaches to aggregate the frame-level features for multi-label video recognition. Our system contains three major components: two-stream sequence model, fast-forward sequence model and temporal residual neural networks. Experiment results on the challenging Youtube-8M dataset demonstrate that our proposed temporal modeling approaches can significantly improve existing temporal modeling approaches in the large-scale video recognition tasks. To be noted, our fast-forward LSTM with a depth of 7 layers achieves 82.75% in term of GAP@20 on the Kaggle Public test set.

Authors (9)

Fu Li (86 papers)
Chuang Gan (196 papers)
Xiao Liu (402 papers)
Yunlong Bian (2 papers)
Xiang Long (29 papers)
Yandong Li (38 papers)
Zhichao Li (31 papers)
Jie Zhou (688 papers)
Shilei Wen (42 papers)

Citations (60)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Temporal Modeling Approaches for Large-scale Youtube-8M Video Understanding (1707.04555v1)

Summary

Related Papers