Do It Yourself: Learning Semantic Correspondence from Pseudo-Labels (2506.05312v1)

Published 5 Jun 2025 in cs.CV

Abstract: Finding correspondences between semantically similar points across images and object instances is one of the everlasting challenges in computer vision. While large pre-trained vision models have recently been demonstrated as effective priors for semantic matching, they still suffer from ambiguities for symmetric objects or repeated object parts. We propose to improve semantic correspondence estimation via 3D-aware pseudo-labeling. Specifically, we train an adapter to refine off-the-shelf features using pseudo-labels obtained via 3D-aware chaining, filtering wrong labels through relaxed cyclic consistency, and 3D spherical prototype mapping constraints. While reducing the need for dataset specific annotations compared to prior work, we set a new state-of-the-art on SPair-71k by over 4% absolute gain and by over 7% against methods with similar supervision requirements. The generality of our proposed approach simplifies extension of training to other data sources, which we demonstrate in our experiments.

Summary

The paper proposes a 3D-aware pseudo-labeling technique that refines pre-trained model features and achieves significant benchmark improvements.
It employs cyclic consistency and spherical prototype mapping to filter and verify pseudo-label quality from zero-shot semantic matching.
The approach reduces reliance on extensive annotations and generalizes across diverse datasets, demonstrating over 4% gain on SPair-71k.

Learning Semantic Correspondence from Pseudo-Labels

The task of semantic correspondence estimation in computer vision involves identifying matching points between semantically similar elements across different images or object instances. Despite advancements with pre-trained vision models, challenges remain in handling symmetric objects and repeated object parts due to inherent ambiguities in feature extraction. The paper "Do It Yourself: Learning Semantic Correspondence from Pseudo-Labels" presents an innovative approach to address these challenges by utilizing a method of 3D-aware pseudo-labeling to refine foundational model features, reducing the necessity for dataset-specific annotations.

Methodology Overview

Generating Pseudo-Labels: The proposed method employs pre-trained foundational models like DINOv2 and Stable Diffusion (SD) to generate initial pseudo-labels through zero-shot semantic matching processes. This involves image pair selection with moderate viewpoint changes to ensure high-quality pseudo-labels for training.
Improving Pseudo-Label Quality: Recognizing the need to enhance the accuracy of the initial pseudo-labels, the methodology incorporates various filtering and consistency checks:
- Chaining and Cyclic Consistency: Matches are verified using relaxed cyclic consistency constraints across image pairs to minimize spurious label generation.
- Spherical Prototype Mapping: By leveraging a 3D spherical prototype, the model imposes constraints to reject incorrect matches, thereby enhancing the accuracy of the semantic correspondences.
Training the Adapter: A lightweight adapter is trained using pseudo-labels. This adapter is designed to refine pre-trained model features, improving semantic matching performance across varied datasets with weak supervision requirements.

Results and Implications

The experimental results demonstrate significant improvements over existing state-of-the-art methods in semantic correspondence estimation, particularly on the SPair-71k benchmark. The approach sets a new benchmark by achieving over 4% absolute gain on SPair-71k and over 7% compared to methods with similar levels of supervision.

The success of this approach has notable implications:

Reduced Annotation Requirements: The use of pseudo-labels mitigates the need for extensive manual annotations, making the model more scalable across larger datasets.
Generalization to Diverse Data: The simplicity and efficacy of the method promise easier extensions to other datasets and domains without relying on cumbersome data-specific labeling conventions.

Future Directions

The research paves the way for further exploration into pseudo-label-based training methods, potentially extending its applicability to broader AI domains and tasks beyond semantic correspondence, such as 3D object recognition and styles transfer.

Moreover, enhancing the refinement processes, including the spherical mapping to encompass more complex object geometries, could offer additional improvements. Investigations into the application of the methodology on datasets like ImageNet-3D underscore its scalability, offering insights into potential improvements with minimal supervision.

In summary, the paper offers a promising advancement in semantic correspondence estimation through innovative pseudo-labeling approaches, effectively addressing challenges of ambiguity in feature extraction and setting new standards in performance across the field.

PDF Markdown

Related Papers

Find Related Papers

Tweets

https://twitter.com/OlafD31960/status/1937915309131760077

YouTube

Show All Videos