Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Contextualized Spatio-Temporal Contrastive Learning with Self-Supervision (2112.05181v2)

Published 9 Dec 2021 in cs.CV

Abstract: Modern self-supervised learning algorithms typically enforce persistency of instance representations across views. While being very effective on learning holistic image and video representations, such an objective becomes sub-optimal for learning spatio-temporally fine-grained features in videos, where scenes and instances evolve through space and time. In this paper, we present Contextualized Spatio-Temporal Contrastive Learning (ConST-CL) to effectively learn spatio-temporally fine-grained video representations via self-supervision. We first design a region-based pretext task which requires the model to transform in-stance representations from one view to another, guided by context features. Further, we introduce a simple network design that successfully reconciles the simultaneous learning process of both holistic and local representations. We evaluate our learned representations on a variety of downstream tasks and show that ConST-CL achieves competitive results on 6 datasets, including Kinetics, UCF, HMDB, AVA-Kinetics, AVA and OTB.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Liangzhe Yuan (19 papers)
  2. Rui Qian (50 papers)
  3. Yin Cui (45 papers)
  4. Boqing Gong (100 papers)
  5. Florian Schroff (21 papers)
  6. Ming-Hsuan Yang (377 papers)
  7. Hartwig Adam (49 papers)
  8. Ting Liu (329 papers)
Citations (14)

Summary

We haven't generated a summary for this paper yet.