Canonical Surface Mapping via Geometric Cycle Consistency (1907.10043v2)

Published 23 Jul 2019 in cs.CV

Abstract: We explore the task of Canonical Surface Mapping (CSM). Specifically, given an image, we learn to map pixels on the object to their corresponding locations on an abstract 3D model of the category. But how do we learn such a mapping? A supervised approach would require extensive manual labeling which is not scalable beyond a few hand-picked categories. Our key insight is that the CSM task (pixel to 3D), when combined with 3D projection (3D to pixel), completes a cycle. Hence, we can exploit a geometric cycle consistency loss, thereby allowing us to forgo the dense manual supervision. Our approach allows us to train a CSM model for a diverse set of classes, without sparse or dense keypoint annotation, by leveraging only foreground mask labels for training. We show that our predictions also allow us to infer dense correspondence between two images, and compare the performance of our approach against several methods that predict correspondence by leveraging varying amount of supervision.

Citations (95)

View on Semantic Scholar

Summary

The paper introduces a geometric cycle consistency loss that enforces mapping pixels from images to a canonical 3D surface and back.
It achieves dense correspondences with a self-supervised framework using only foreground masks, reducing the need for manual annotations.
Experimental results demonstrate robust keypoint transfer across diverse object categories, outperforming several baselines.

Canonical Surface Mapping via Geometric Cycle Consistency

This paper introduces an approach for predicting canonical surface mappings (CSM) in images using geometric cycle consistency. The primary objective is to learn a per-pixel mapping from an image to a 3D surface model of an object category, achieving a dense understanding of the object's geometry without relying on manual annotations such as keypoints or poses. The authors propose a self-supervised learning framework that leverages geometric cycle consistency to train a CSM model using only foreground masks as supervision.

Key Contributions

Geometric Cycle Consistency Loss: The authors utilize a geometric cycle consistency loss to train the CSM predictor. This loss ensures that a pixel in an image, when mapped to a 3D point on the canonical surface and projected back using a camera model, should map to the original pixel in the image. This enforces consistency and leverages the underlying geometric structure of images.
Relaxed Supervision Requirements: Unlike traditional methods that rely on keypoint annotations or a large amount of synthetic data, this approach significantly minimizes supervision. By using only foreground masks, the authors demonstrate the feasibility of predicting dense correspondences for diverse categories.
Application to Dense Correspondences: The CSM predictor learned using geometric consistency provides a robust framework to infer dense correspondences between two images. By mapping image pixels to a canonical 3D model, the method can match pixels across different images of the same category, thus finding semantic correspondence without exhaustive dataset annotations.
Scalability Across Categories: The method scales effectively across various categories, including birds, cars, horses, and zebras, evidenced by quantitative evaluations using datasets like CUB-200-2011 and PASCAL3D+. Notably, it extends to unannotated image collections, such as those from ImageNet, showcasing its adaptability.

Experimental Results

The proposed framework is evaluated on the task of keypoint transfer, where it achieves higher accuracy in predicting correspondences compared to several baselines, particularly outperforming self-supervised methods and those using synthetic data. Key metrics include the Percentage of Correct Keypoints (PCK) and the Keypoint Transfer Average Precision (APK), with the framework demonstrating robust transfer precision.

Implications and Future Directions

The implications of this research are noteworthy, both theoretically and practically. Practically, the reduction in supervision requirements lowers barriers to large-scale deployment across a multitude of object categories, allowing applications in fields like robotics, augmented reality, and computer vision-based modeling. Theoretically, it refines our understanding of geometric consistency as an independent supervisory signal.

Future work could explore the integration of time-consistent predictions across video frames, leveraging temporal information to refine the consistency and accuracy of CSM predictions further. Additionally, addressing challenges in categories with significant morphological variations or articulations remains a potential area for development.

In summary, this paper presents a compelling methodology for unsupervised learning of canonical surface mappings, setting a precedent for geometry-driven approaches to deep learning problems in 3D understanding and dense correspondence.

PDF Markdown

Related Papers

YouTube

Show All Videos