PRAGO: Differentiable Multi-View Pose Optimization From Objectness Detections (2403.08586v2)
Abstract: Robustly estimating camera poses from a set of images is a fundamental task which remains challenging for differentiable methods, especially in the case of small and sparse camera pose graphs. To overcome this challenge, we propose Pose-refined Rotation Averaging Graph Optimization (PRAGO). From a set of objectness detections on unordered images, our method reconstructs the rotational pose, and in turn, the absolute pose, in a differentiable manner benefiting from the optimization of a sequence of geometrical tasks. We show how our objectness pose-refinement module in PRAGO is able to refine the inherent ambiguities in pairwise relative pose estimation without removing edges and avoiding making early decisions on the viability of graph edges. PRAGO then refines the absolute rotations through iterative graph construction, reweighting the graph edges to compute the final rotational pose, which can be converted into absolute poses using translation averaging. We show that PRAGO is able to outperform non-differentiable solvers on small and sparse scenes extracted from 7-Scenes achieving a relative improvement of 21% for rotations while achieving similar translation estimates.
- Ceres solver: Tutorial & reference. Google Inc, 2(72):8, 2012.
- Spectral synchronization of multiple views in se (3). SIAM Journal on Imaging Sciences, 9(4):1963–1990, 2016.
- Viewing graph solvability via cycle consistency. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 5540–5549, 2021.
- Semantic structure from motion. In CVPR 2011, pages 2025–2032. IEEE, 2011.
- Two-view geometry scoring without correspondences. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8979–8989, 2023.
- Evaluating the performance of structure from motion pipelines. Journal of Imaging, 4(8), 2018.
- Probabilistic data association for semantic slam. In 2017 IEEE international conference on robotics and automation (ICRA), pages 1722–1729. IEEE, 2017.
- How attentive are graph attention networks? In International Conference on Learning Representations, 2022.
- Extreme rotation estimation using dense correlation volumes. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- Hybrid rotation averaging: A fast and robust rotation averaging approach. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10353–10362, 2021.
- Structure from motion with objects. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4141–4149, 2016.
- Structure-and-motion pipeline on a hierarchical cluster tree. In 2009 IEEE 12th international conference on computer vision workshops, ICCV workshops, pages 1489–1496. IEEE, 2009.
- Fast graph representation learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.
- Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6):381–395, 1981.
- Real-time monocular object slam. Robotics and Autonomous Systems, 75:435–449, 2016.
- Real-time rgb-d camera relocalization. In International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, 2013.
- Shapefit and shapekick for robust, scalable structure from motion. ArXiv, abs/1608.02165, 2016.
- V.M. Govindu. Combining two-view constraints for motion estimation. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, pages II–II, 2001.
- R.I. Hartley. In defense of the eight-point algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(6):580–593, 1997.
- Multiple view geometry in computer vision. Cambridge university press, 2003.
- Robustifying the multi-scale representation of neural radiance fields. arXiv preprint arXiv:2210.04233, 2022.
- Rossmann-toolbox: a deep learning-based protocol for the prediction and design of cofactor specificity in Rossmann fold proteins. Briefings in Bioinformatics, 23(1), 2021.
- Learning open-world object proposals without learning to classify. IEEE Robotics and Automation Letters, 7(2):5453–5460, 2022.
- Rotation-only bundle adjustment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- Hara: A hierarchical approach for robust rotation averaging. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15777–15786, 2022.
- Five-point motion estimation made easy. In Proceedings of the 18th International Conference on Pattern Recognition - Volume 01, page 630–633, USA, 2006. IEEE Computer Society.
- Rago: Recurrent graph optimizer for multiple rotation averaging. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15787–15796, 2022.
- Relative geometry-aware siamese neural network for 6dof camera relocalization. Neurocomputing, 426:134–146, 2021.
- Fusion++: Volumetric object-level slam. In 2018 international conference on 3D vision (3DV), pages 32–41. IEEE, 2018.
- Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
- Deep permutation equivariant structure from motion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5976–5986, 2021.
- OpenMVG: Open multiple view geometry. In International Workshop on Reproducible Research in Pattern Recognition, pages 60–74. Springer, 2016.
- Quadricslam: Dual quadrics from object detections as landmarks in object-oriented slam. IEEE Robotics and Automation Letters, 4(1):1–8, 2018.
- David. Nistér. An efficient solution to the five-point relative pose problem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(6):756–770, 2004.
- Robust camera location estimation by convex programming. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2674–2683, 2014.
- Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc., 2019.
- Neurora: Neural robust rotation averaging. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIV 16, pages 137–154. Springer, 2020.
- Kimera: From slam to spatial perception with 3d dynamic scene graphs. The International Journal of Robotics Research, 40(12-14):1510–1546, 2021.
- Superglue: Learning feature matching with graph neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4938–4947, 2020.
- Structure-from-motion revisited. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4104–4113, 2016.
- Recovering camera motion using l\\\backslash\infty minimization. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), pages 1230–1237. IEEE, 2006.
- Photo tourism: exploring photo collections in 3d. In ACM siggraph 2006 papers, pages 835–846. 2006.
- Pose synchronization under multiple pair-wise relative poses. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13072–13081, 2023.
- Posernet: Refining relative camera poses exploiting object detections. In European Conference on Computer Vision, pages 247–263. Springer, 2022.
- Rotation synchronization via deep matrix factorization. ICRA, 2023.
- Bundle adjustment—a modern synthesis. In Vision Algorithms: Theory and Practice: International Workshop on Vision Algorithms Corfu, Greece, September 21–22, 1999 Proceedings, pages 298–372. Springer, 2000.
- Robust global translations with 1DSfM. In Proceedings of the European Conference on Computer Vision (ECCV), 2014.
- End-to-end rotation averaging with multi-source propagation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11774–11783, 2021.
- Cubeslam: Monocular 3-d object slam. IEEE Transactions on Robotics, 35(4):925–938, 2019.
- iNeRF: Inverting neural radiance fields for pose estimation. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021.
- Learning iterative robust transformation synchronization. In 2021 International Conference on 3D Vision (3DV), pages 1206–1215. IEEE, 2021.
- Disambiguating visual relations using loop constraints. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 1426–1433, 2010.
- Revisiting rotation averaging: Uncertainties and robust losses. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17215–17224, 2023.
- Baseline desensitizing in translation averaging. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4539–4547, 2018.