EAR-Net: Pursuing End-to-End Absolute Rotations from Multi-View Images (2310.10051v2)
Abstract: Absolute rotation estimation is an important topic in 3D computer vision. Existing works in literature generally employ a multi-stage (at least two-stage) estimation strategy where multiple independent operations (feature matching, two-view rotation estimation, and rotation averaging) are implemented sequentially. However, such a multi-stage strategy inevitably leads to the accumulation of the errors caused by each involved operation, and degrades its final estimation on global rotations accordingly. To address this problem, we propose an End-to-end method for estimating Absolution Rotations from multi-view images based on deep neural Networks, called EAR-Net. The proposed EAR-Net consists of an epipolar confidence graph construction module and a confidence-aware rotation averaging module. The epipolar confidence graph construction module is explored to simultaneously predict pairwise relative rotations among the input images and their corresponding confidences, resulting in a weighted graph (called epipolar confidence graph). Based on this graph, the confidence-aware rotation averaging module, which is differentiable, is explored to predict the absolute rotations. Thanks to the introduced confidences of the relative rotations, the proposed EAR-Net could effectively handle outlier cases. Experimental results on three public datasets demonstrate that EAR-Net outperforms the state-of-the-art methods by a large margin in terms of accuracy and speed.
- Large-scale data for multiple-view stereopsis. International Journal of Computer Vision, 120(2):153–168.
- Multi-view depth estimation by fusing single-view depth probability with multi-view geometry. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2842–2851.
- Key. net: Keypoint detection by handcrafted and learned cnn filters. In Proceedings of the IEEE International Conference on Computer Vision, pages 5836–5844.
- Visual camera re-localization from rgb and rgb-d images using dsac. IEEE transactions on pattern analysis and machine intelligence, 44(9):5847–5865.
- Extreme rotation estimation using dense correlation volumes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 14566–14575.
- End-to-end object detection with transformers. In European Conference on Computer Vision, pages 213–229. Springer.
- Robust relative rotation averaging. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4):958–972.
- Aspanformer: Detector-free image matching with adaptive span transformer. In European Conference on Computer Vision, pages 20–36. Springer.
- Epro-pnp: Generalized end-to-end probabilistic perspective-n-points for monocular object pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2781–2790.
- Wide-baseline relative camera pose estimation with directional learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3258–3268.
- Graph-based parallel large scale structure from motion. Pattern Recognition, 107:107537.
- Hybrid rotation averaging: A fast and robust rotation averaging approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 10358–10367.
- Discrete-continuous optimization for large-scale structure from motion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3001–3008. IEEE.
- Voting-based incremental structure-from-motion. In International Conference on Pattern Recognition, pages 1929–1934. IEEE.
- Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
- Superpoint: Self-supervised interest point detection and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition workshops, pages 224–236.
- Robust camera translation estimation via rank enforcement. IEEE Transactions on Cybernetics, 52(2):862–872.
- Rpnet: An end-to-end network for relative camera pose estimation. In Proceedings of the European Conference on Computer Vision Workshops.
- On the instability of relative pose estimation and ransac’s role. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8935–8943.
- Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6):381–395.
- Incremental rotation averaging. International Journal of Computer Vision, 129(4):1202–1216.
- Govindu, V. M. (2004). Lie-algebraic averaging for globally consistent motion estimation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 1. IEEE.
- Govindu, V. M. (2006). Robustness in motion averaging. In Asian Conference on Computer Vision, pages 457–466. Springer.
- L1 rotation averaging using the weiszfeld algorithm. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3041–3048.
- Identity mappings in deep residual networks. In European Conference on Computer Vision, pages 630–645. Springer.
- Hara: A hierarchical approach for robust rotation averaging. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 15777–15786.
- Rago: Recurrent graph optimizer for multiple rotation averaging. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 15787–15796.
- Microsoft coco: Common objects in context. In European Conference on Computer Vision, pages 740–755. Springer.
- Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91–110.
- Averaging quaternions. Journal of Guidance, Control, and Dynamics, 30(4):1193–1197.
- Relative camera pose estimation using convolutional neural networks. In International Conference on Advanced Concepts for Intelligent Vision Systems, pages 675–687. Springer.
- Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106.
- Nistér, D. (2004). An efficient solution to the five-point relative pose problem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(6):756–770.
- Neurora: Neural robust rotation averaging. In European Conference on Computer Vision, pages 137–154.
- R2d2: Reliable and repeatable detector and descriptor. Advances in Neural Information Processing Systems, 32.
- Superglue: Learning feature matching with graph neural networks. In IEEE Conference on Computer Vision and Pattern Recognition.
- Structure-from-motion revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4104–4113.
- Graph-based consistent matching for structure-from-motion. In European Conference on Computer Vision, pages 139–155. Springer.
- Scene coordinate regression forests for camera relocalization in rgb-d images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2930–2937.
- It is all in the weights: robust rotation averaging revisited. In 2021 International Conference on 3D Vision, pages 1134–1143. IEEE.
- Photo tourism: exploring photo collections in 3d. In ACM siggraph 2006 papers, pages 835–846.
- Loftr: Detector-free local feature matching with transformers. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8922–8931.
- Hynet: Learning local descriptor with hybrid similarity measure and triplet loss. In Advances in Neural Information Processing Systems, volume 33, pages 7401–7412.
- Matchformer: Interleaving attention in transformers for feature matching. In Asian Conference on Computer Vision.
- End-to-end rotation averaging with multi-source propagation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 11774–11783.
- Mvsnet: Depth inference for unstructured multi-view stereo. In Proceedings of the European Conference on Computer Vision, pages 767–783.
- Lift: Learned invariant feature transform. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI 14, pages 467–483. Springer.
- Disambiguating visual relations using loop constraints. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1426–1433. IEEE.
- Patch2pix: Epipolar-guided pixel-level correspondences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4669–4678.
- On the continuity of rotation representations in neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5745–5753.