Correspondences of the Third Kind: Camera Pose Estimation from Object Reflection (2312.04527v2)
Abstract: Computer vision has long relied on two kinds of correspondences: pixel correspondences in images and 3D correspondences on object surfaces. Is there another kind, and if there is, what can they do for us? In this paper, we introduce correspondences of the third kind we call reflection correspondences and show that they can help estimate camera pose by just looking at objects without relying on the background. Reflection correspondences are point correspondences in the reflected world, i.e., the scene reflected by the object surface. The object geometry and reflectance alters the scene geometrically and radiometrically, respectively, causing incorrect pixel correspondences. Geometry recovered from each image is also hampered by distortions, namely generalized bas-relief ambiguity, leading to erroneous 3D correspondences. We show that reflection correspondences can resolve the ambiguities arising from these distortions. We introduce a neural correspondence estimator and a RANSAC algorithm that fully leverages all three kinds of correspondences for robust and accurate joint camera pose and object shape estimation just from the object appearance. The method expands the horizon of numerous downstream tasks, including camera pose estimation for appearance modeling (e.g., NeRF) and motion estimation of reflective objects (e.g., cars on the road), to name a few, as it relieves the requirement of overlapping background.
- The Bas-Relief Ambiguity. IJCV, 35(1):33–44, 1999.
- Partially Calibrated Semi-Generalized Pose from Hybrid Point Correspondences. In Proc. WACV, 2023.
- SAMURAI: Shape And Material from Unconstrained Real-world Arbitrary Image collections. In Proc. NeurIPS, 2022.
- 3D Pose Estimation and Segmentation using Specular Cues. In Proc. CVPR, pages 1706–1713. IEEE Computer Society, 2009.
- Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Commun. ACM, 24(6):381–395, 1981.
- Christopher G. Harris. Structure-from-Motion under Orthographic Projection. IJCV, 9(5):329–332, 1991.
- Berthold K. P. Horn and Robert W. Sjoberg. Calculating the Reflectance Map. Applied optics, 18(11):1770–1779, 1979.
- 3D Pose Refinement from Reflections. In Proc. CVPR. IEEE Computer Society, 2008.
- NeRO: Neural Geometry and BRDF Reconstruction of Reflective Objects from Multiview Images. ACM TOG, 2023.
- David G. Lowe. Distinctive Image Features from Scale-Invariant Keypoints. IJCV, 60(2):91–110, 2004.
- NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In Proc. ECCV, 2020.
- Recognition Using Specular Highlights. TPAMI, 35(3):639–652, 2013.
- Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision. In Proc. CVPR, 2020.
- Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation. In Proc. ICCV, 2019.
- Camera and Light Calibration from Reflections on a Sphere. CVIU, 117(10):1536–1547, 2013.
- Structure-from-Motion Revisited. In Proc. CVPR, 2016.
- Representation Learning with Contrastive Predictive Coding. arXiv preprint arXiv:1807.03748, 2018.
- SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17:261–272, 2020.
- NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction. In Proc. NeurIPS, pages 27171–27183, 2021.
- nLMVS-Net: Deep Non-Lambertian Multi-View Stereo. In Proc. WACV, pages 3037–3046, 2023a.
- DeepShaRM: Multi-View Shape and Reflectance Map Recovery Under Unknown Lighting. arXiv preprint arXiv:2310.17632, 2023b.
- Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance. In Proc. NeurIPS, pages 2492–2502, 2020.
- PhySG: Inverse Rendering with Spherical Gaussians for Physics-based Material Editing and Relighting. In Proc. CVPR, pages 5453–5462, 2021a.
- NeRFactor: Neural Factorization of Shape and Reflectance Under an Unknown Illumination. ACM TOG, 40(6):237:1–237:18, 2021b.
- A Survey of Structure from Motion. Acta Numerica, 26:305–364, 2017.