Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization (2108.02183v2)

Published 4 Aug 2021 in cs.CV

Abstract: The crux of self-supervised video representation learning is to build general features from unlabeled videos. However, most recent works have mainly focused on high-level semantics and neglected lower-level representations and their temporal relationship which are crucial for general video understanding. To address these challenges, this paper proposes a multi-level feature optimization framework to improve the generalization and temporal modeling ability of learned video representations. Concretely, high-level features obtained from naive and prototypical contrastive learning are utilized to build distribution graphs, guiding the process of low-level and mid-level feature learning. We also devise a simple temporal modeling module from multi-level features to enhance motion pattern learning. Experiments demonstrate that multi-level feature optimization with the graph constraint and temporal modeling can greatly improve the representation ability in video understanding. Code is available at https://github.com/shvdiwnkozbw/Video-Representation-via-Multi-level-Optimization.

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (8)

Rui Qian (50 papers)
Yuxi Li (45 papers)
Huabin Liu (14 papers)
John See (28 papers)
Shuangrui Ding (22 papers)
Xian Liu (37 papers)
Dian Li (28 papers)
Weiyao Lin (87 papers)

Citations (40)

View on Semantic Scholar

GitHub

GitHub - shvdiwnkozbw/Video-Representation-via-Multi-level-Optimization: Code for Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization. (10 stars)

Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization (2108.02183v2)

Related Papers

GitHub