Correspondences of the Third Kind: Camera Pose Estimation from Object Reflection (2312.04527v2)

Published 7 Dec 2023 in cs.CV

Abstract: Computer vision has long relied on two kinds of correspondences: pixel correspondences in images and 3D correspondences on object surfaces. Is there another kind, and if there is, what can they do for us? In this paper, we introduce correspondences of the third kind we call reflection correspondences and show that they can help estimate camera pose by just looking at objects without relying on the background. Reflection correspondences are point correspondences in the reflected world, i.e., the scene reflected by the object surface. The object geometry and reflectance alters the scene geometrically and radiometrically, respectively, causing incorrect pixel correspondences. Geometry recovered from each image is also hampered by distortions, namely generalized bas-relief ambiguity, leading to erroneous 3D correspondences. We show that reflection correspondences can resolve the ambiguities arising from these distortions. We introduce a neural correspondence estimator and a RANSAC algorithm that fully leverages all three kinds of correspondences for robust and accurate joint camera pose and object shape estimation just from the object appearance. The method expands the horizon of numerous downstream tasks, including camera pose estimation for appearance modeling (e.g., NeRF) and motion estimation of reflective objects (e.g., cars on the road), to name a few, as it relieves the requirement of overlapping background.

References (25)

Summary

The paper introduces reflection correspondences as a third modality that leverages neural networks and RANSAC to resolve camera pose ambiguities.
The paper employs reflectance maps to capture complex light interactions, enabling robust detection of both reflection and conventional correspondences.
The paper demonstrates enhanced camera calibration and object pose estimation, particularly for reflective, textureless objects where traditional methods falter.

Introduction

Computer vision methodologies commonly make use of two types of correspondences to comprehend three-dimensional space through images. Pixel correspondences across images and 3D correspondences on the surfaces of objects are well-established tools. A novel kind of correspondence, however, has recently been explored. Termed reflection correspondences, these are based on the idea that points on a reflective object's surface provide unique information due to the way they interact with light. This insight has been leveraged in new research to estimate camera pose without relying on a textured background or a known lighting environment.

Camera Pose from Reflections

Reflection correspondences are effectively the points on an object's glossy or reflective surface that reflect the same part of the environment in different images. These correspondences do not conform to traditional color consistency constraints because of specular highlights and other complex radiometric effects present on the object. By using reflection correspondences, along with conventional pixel and 3D correspondences, the ambiguity that typically plagues the estimation process can be resolved. This approach requires understanding how light interacts with the object's surface - a challenge that is addressed through the use of neural networks and geometric algorithms.

Methodology

To exploit reflection correspondences for camera pose estimation, the researchers introduced several innovations. A neural correspondence estimator aids in the detection of both 3D and reflection correspondences that are robust even when dealing with objects distorted by bas-relief ambiguity. Additionally, a RANSAC-based framework alternates between estimating camera positions and the object’s reflective geometry.

A key part of this process involves generating reflectance maps, which encode the surrounding environment modified by an object's surface reflectance. These maps are crucial for detecting reflection correspondences. The detected correspondences in turn help to resolve ambiguity from other types of correspondences which are traditionally used in isolation.

Applications and Implications

The method not only challenges previously held restrictions on image capture for camera pose estimation but also pushes the boundaries of how object appearances can be utilized. It opens up potential advancements in applications beyond camera pose and shape recovery. For instance, camera calibration and object pose estimation could significantly benefit from the proposed technique, particularly when classical correspondences are inadequate.

Concluding Remarks

This research introduces reflection correspondences as a third type of essential information for understanding the relationship between images, camera movement, and object geometry. It successfully addresses the challenge of estimating camera pose and object shape from reflective, textureless objects. This method stands to streamline processes in computer vision where current requirements for overlapping backgrounds or diffuse surface textures introduce limitations.

Related Papers

Tweets

https://twitter.com/887278045761077248/status/1733075205885399350