Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

STC: Spatio-Temporal Contrastive Learning for Video Instance Segmentation (2202.03747v2)

Published 8 Feb 2022 in cs.CV

Abstract: Video Instance Segmentation (VIS) is a task that simultaneously requires classification, segmentation, and instance association in a video. Recent VIS approaches rely on sophisticated pipelines to achieve this goal, including RoI-related operations or 3D convolutions. In contrast, we present a simple and efficient single-stage VIS framework based on the instance segmentation method CondInst by adding an extra tracking head. To improve instance association accuracy, a novel bi-directional spatio-temporal contrastive learning strategy for tracking embedding across frames is proposed. Moreover, an instance-wise temporal consistency scheme is utilized to produce temporally coherent results. Experiments conducted on the YouTube-VIS-2019, YouTube-VIS-2021, and OVIS-2021 datasets validate the effectiveness and efficiency of the proposed method. We hope the proposed framework can serve as a simple and strong alternative for many other instance-level video association tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Zhengkai Jiang (42 papers)
  2. Zhangxuan Gu (17 papers)
  3. Jinlong Peng (34 papers)
  4. Hang Zhou (166 papers)
  5. Liang Liu (237 papers)
  6. Yabiao Wang (93 papers)
  7. Ying Tai (88 papers)
  8. Chengjie Wang (178 papers)
  9. Liqing Zhang (80 papers)
Citations (11)