Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization (2108.02183v2)

Published 4 Aug 2021 in cs.CV

Abstract: The crux of self-supervised video representation learning is to build general features from unlabeled videos. However, most recent works have mainly focused on high-level semantics and neglected lower-level representations and their temporal relationship which are crucial for general video understanding. To address these challenges, this paper proposes a multi-level feature optimization framework to improve the generalization and temporal modeling ability of learned video representations. Concretely, high-level features obtained from naive and prototypical contrastive learning are utilized to build distribution graphs, guiding the process of low-level and mid-level feature learning. We also devise a simple temporal modeling module from multi-level features to enhance motion pattern learning. Experiments demonstrate that multi-level feature optimization with the graph constraint and temporal modeling can greatly improve the representation ability in video understanding. Code is available at https://github.com/shvdiwnkozbw/Video-Representation-via-Multi-level-Optimization.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Rui Qian (50 papers)
  2. Yuxi Li (45 papers)
  3. Huabin Liu (14 papers)
  4. John See (28 papers)
  5. Shuangrui Ding (22 papers)
  6. Xian Liu (37 papers)
  7. Dian Li (28 papers)
  8. Weiyao Lin (87 papers)
Citations (40)