FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation (2403.03221v1)
Abstract: Estimating relative camera poses between images has been a central problem in computer vision. Methods that find correspondences and solve for the fundamental matrix offer high precision in most cases. Conversely, methods predicting pose directly using neural networks are more robust to limited overlap and can infer absolute translation scale, but at the expense of reduced precision. We show how to combine the best of both methods; our approach yields results that are both precise and robust, while also accurately inferring translation scales. At the heart of our model lies a Transformer that (1) learns to balance between solved and learned pose estimations, and (2) provides a prior to guide a solver. A comprehensive analysis supports our design choices and demonstrates that our method adapts flexibly to various feature extractors and correspondence estimators, showing state-of-the-art performance in 6DoF pose estimation on Matterport3D, InteriorNet, StreetLearn, and Map-free Relocalization.
- PlaneFormers: From sparse view planes to 3d reconstruction. In ECCV, 2022.
- Map-free visual relocalization: Metric pose relative to a single image. In ECCV, 2022.
- MAGSAC: marginalizing sample consensus. In CVPR, 2019.
- MAGSAC++, a fast, reliable and accurate robust estimator. In CVPR, 2020.
- Two-view geometry scoring without correspondences. In CVPR, 2023.
- SURF: Speeded up robust features. In ECCV, 2006.
- DSAC-differentiable RANSAC for camera localization. In CVPR, 2017.
- G. Bradski. The OpenCV Library. Dr. Dobb’s Journal of Software Tools, 2000.
- Simultaneous localization and mapping: A survey of current trends in autonomous driving. T-IV, 2017.
- Extreme rotation estimation using dense correlation volumes. In CVPR, 2021.
- Doppelgangers: Learning to disambiguate images of similar structures. In ICCV, 2023.
- Matterport3D: Learning from RGB-D data in indoor environments. In 3DV, 2017.
- ASpanFormer: Detector-free image matching with adaptive span transformer. In ECCV, 2022.
- Wide-baseline relative camera pose estimation with directional learning. In CVPR, 2021.
- DeepFactors: Real-time probabilistic dense monocular SLAM. RA-L.
- SuperPoint: Self-supervised interest point detection and description. In CVPRW, 2018.
- An image is worth 16x16 words: Transformers for image recognition at scale. ICLR, 2021.
- D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. In CVPR, 2019.
- DKM: Dense kernelized feature matching for geometry estimation. In CVPR, 2023.
- RPNet: An end-to-end network for relative camera pose estimation. In ECCVW, 2018.
- William Falcon and The PyTorch Lightning team. PyTorch Lightning, 2019.
- Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 1981.
- Are we ready for autonomous driving? the kitti vision benchmark suite. In CVPR, 2012.
- Multiple view geometry in computer vision. Cambridge university press, 2003.
- Richard I Hartley. Estimation of relative camera positions for uncalibrated cameras. In ECCV. Springer, 1992.
- Richard I Hartley. In defense of the eight-point algorithm. TPAMPI, 19(6):580–593, 1997.
- Deep residual learning for image recognition. In CVPR, 2016.
- Adaptive assignment for geometry aware local feature matching. In CVPR, 2023.
- Few-view object reconstruction with unknown categories and camera poses. arXiv preprint arXiv:2212.04492, 2022.
- Planar surface reconstruction from sparse views. In ICCV, 2021.
- End-to-end learning of geometry and context for deep stereo regression. In ICCV, 2017.
- StereoNet: Guided hierarchical refinement for real-time edge-aware depth prediction. In ECCV, 2018.
- GRelPose: Generalizable end-to-end relative camera pose regression. arXiv preprint arXiv:2211.14950, 2022.
- Adam: A method for stochastic optimization. ICLR, 2015.
- Scenes: Subpixel correspondence estimation with epipolar supervision. 2024.
- Xvo: Generalized visual odometry via cross-modal self-training. In ICCV, 2023.
- Video autoencoder: self-supervised disentanglement of static 3d structure and motion. In ICCV, 2021.
- Beyond grobner bases: Basis selection for minimal solvers. In CVPR, 2018.
- Five-point motion estimation made easy. In ICPR, 2006.
- InteriorNet: Mega-scale multi-sensor photo-realistic indoor scenes dataset. In BVMC, 2018.
- RelPose++: Recovering 6d poses from sparse-view observations. arXiv preprint arXiv:2305.04926, 2023.
- BARF: Bundle-adjusting neural radiance fields. In ICCV, 2021.
- LightGlue: Local Feature Matching at Light Speed. In ICCV, 2023.
- H Christopher Longuet-Higgins. A computer algorithm for reconstructing a scene from two projections. Nature, 293(5828):133–135, 1981.
- David G Lowe. Distinctive image features from scale-invariant keypoints. IJCV.
- ViLBERT: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. NeurIPS, 32, 2019.
- Robust wide-baseline stereo from maximally stable extremal regions. Image and vision computing, 22(10):761–767, 2004.
- The StreetLearn environment and dataset. arXiv preprint arXiv:1903.01292, 2019.
- WxBS: Wide baseline stereo generalizations. BMVC, 2015.
- ORB-SLAM: a versatile and accurate monocular SLAM system. T-RO.
- PATS: Patch area transportation with subdivision for local feature matching. In CVPR, 2023.
- David Nistér. An efficient solution to the five-point relative pose problem. TPAMI, 2004.
- PyTorch: An imperative style, high-performance deep learning library. NeurIPS, 32, 2019.
- Wide baseline stereo matching. In ICCV, 1998.
- Associative3D: Volumetric reconstruction from sparse views. In ECCV, 2020.
- USAC: A universal framework for random sample consensus. IEEE transactions on pattern analysis and machine intelligence, 2012.
- Deep fundamental matrix estimation. In ECCV, 2018.
- Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. TPAMI, 2020.
- Vision transformers for dense prediction. In ICCV, 2021.
- Plane-based odometry using an RGB-D camera. In BMVC.
- Kornia: an open source differentiable computer vision library for PyTorch. In WACV, 2020.
- The 8-point algorithm as an inductive bias for relative pose prediction by ViTs. In 3DV, 2022.
- End2End multi-view feature matching with differentiable pose optimization. In ICCV, 2023.
- ORB: An efficient alternative to SIFT or SURF. In ICCV, 2011.
- SuperGlue: Learning feature matching with graph neural networks. In CVPR, 2020.
- Habitat: A platform for embodied AI research. In ICCV, 2019.
- Structure-from-motion revisited. In CVPR, 2016.
- SparsePose: Sparse-view camera pose regression and refinement. In CVPR, 2023.
- Super-Convergence: Very fast training of neural networks using large learning rates. In Artificial intelligence and machine learning for multi-domain operations applications, 2019.
- LoFTR: Detector-free local feature matching with transformers. In CVPR, 2021.
- NOPE-SAC: Neural one-plane RANSAC for sparse-view planar 3d reconstruction. TPAMI, 2023.
- DROID-SLAM: Deep visual SLAM for monocular, stereo, and RGB-D cameras. NeurIPS, 2021.
- Deep patch visual odometry. arXiv preprint arXiv:2208.04726, 2022.
- MLESAC: A new robust estimator with application to estimating image geometry. Computer vision and image understanding, 78(1):138–156, 2000.
- PoseDiffusion: Solving pose estimation via diffusion-aided bundle adjustment. In ICCV, 2023.
- TartanVO: A generalizable learning-based VO. In CoRL, 2021.
- Generalized differentiable RANSAC. In ICCV, 2023.
- Ross Wightman. PyTorch image models. https://github.com/rwightman/pytorch-image-models, 2019.
- Extreme relative pose estimation for RGB-D scans via scene completion. In CVPR, 2019.
- Extreme relative pose network under hybrid representations. In CVPR, 2020.
- DS-SLAM: A semantic visual SLAM towards dynamic environments. In IROS, 2018.
- RelPose: Predicting probabilistic relative rotation for single objects in the wild. In ECCV, 2022.
- Progressive correspondence pruning by consensus learning. In ICCV, 2021.
- On the continuity of rotation representations in neural networks. In CVPR, 2019.
- Chris Rockwell (9 papers)
- Nilesh Kulkarni (17 papers)
- Linyi Jin (12 papers)
- Jeong Joon Park (24 papers)
- Justin Johnson (56 papers)
- David F. Fouhey (32 papers)