Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning (2001.00294v1)

Published 2 Jan 2020 in cs.CV

Abstract: We propose a novel self-supervised method, referred to as Video Cloze Procedure (VCP), to learn rich spatial-temporal representations. VCP first generates "blanks" by withholding video clips and then creates "options" by applying spatio-temporal operations on the withheld clips. Finally, it fills the blanks with "options" and learns representations by predicting the categories of operations applied on the clips. VCP can act as either a proxy task or a target task in self-supervised learning. As a proxy task, it converts rich self-supervised representations into video clip operations (options), which enhances the flexibility and reduces the complexity of representation learning. As a target task, it can assess learned representation models in a uniform and interpretable manner. With VCP, we train spatial-temporal representation models (3D-CNNs) and apply such models on action recognition and video retrieval tasks. Experiments on commonly used benchmarks show that the trained models outperform the state-of-the-art self-supervised models with significant margins.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Dezhao Luo (10 papers)
  2. Chang Liu (864 papers)
  3. Yu Zhou (335 papers)
  4. Dongbao Yang (16 papers)
  5. Can Ma (21 papers)
  6. Qixiang Ye (110 papers)
  7. Weiping Wang (123 papers)
Citations (156)

Summary

We haven't generated a summary for this paper yet.