Contrastive Transformation for Self-supervised Correspondence Learning (2012.05057v1)

Published 9 Dec 2020 in cs.CV

Abstract: In this paper, we focus on the self-supervised learning of visual correspondence using unlabeled videos in the wild. Our method simultaneously considers intra- and inter-video representation associations for reliable correspondence estimation. The intra-video learning transforms the image contents across frames within a single video via the frame pair-wise affinity. To obtain the discriminative representation for instance-level separation, we go beyond the intra-video analysis and construct the inter-video affinity to facilitate the contrastive transformation across different videos. By forcing the transformation consistency between intra- and inter-video levels, the fine-grained correspondence associations are well preserved and the instance-level feature discrimination is effectively reinforced. Our simple framework outperforms the recent self-supervised correspondence methods on a range of visual tasks including video object tracking (VOT), video object segmentation (VOS), pose keypoint tracking, etc. It is worth mentioning that our method also surpasses the fully-supervised affinity representation (e.g., ResNet) and performs competitively against the recent fully-supervised algorithms designed for the specific tasks (e.g., VOT and VOS).

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (3)

Ning Wang (300 papers)
Wengang Zhou (153 papers)
Houqiang Li (236 papers)

Citations (33)

View on Semantic Scholar

Contrastive Transformation for Self-supervised Correspondence Learning (2012.05057v1)

Related Papers