KS-APR: Keyframe Selection for Robust Absolute Pose Regression (2308.05459v2)
Abstract: Markerless Mobile Augmented Reality (AR) aims to anchor digital content in the physical world without using specific 2D or 3D objects. Absolute Pose Regressors (APR) are end-to-end machine learning solutions that infer the device's pose from a single monocular image. Thanks to their low computation cost, they can be directly executed on the constrained hardware of mobile AR devices. However, APR methods tend to yield significant inaccuracies for input images that are too distant from the training set. This paper introduces KS-APR, a pipeline that assesses the reliability of an estimated pose with minimal overhead by combining the inference results of the APR and the prior images in the training set. Mobile AR systems tend to rely upon visual-inertial odometry to track the relative pose of the device during the experience. As such, KS-APR favours reliability over frequency, discarding unreliable poses. This pipeline can integrate most existing APR methods to improve accuracy by filtering unreliable images with their pose estimates. We implement the pipeline on three types of APR models on indoor and outdoor datasets. The median error on position and orientation is reduced for all models, and the proportion of large errors is minimized across datasets. Our method enables state-of-the-art APRs such as DFNetdm to outperform single-image and sequential APR methods. These results demonstrate the scalability and effectiveness of KS-APR for visual localization tasks that do not require one-shot decisions.
- Relocnet: Continuous metric learning relocalisation using neural nets. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 751–767, 2018.
- Robust tightly-coupled visual-inertial odometry with pre-built maps in high latency situations. IEEE Transactions on Visualization and Computer Graphics, 28(5):2212–2222, 2022.
- E. Brachmann and C. Rother. Learning less is more-6d camera localization via 3d surface regression. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4654–4662, 2018.
- Geometry-aware learning of maps for camera localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2616–2625, 2018.
- 6d camera relocalization in ambiguous scenes via continuous multimodal inference. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVIII 16, pp. 139–157. Springer, 2020.
- A hybrid probabilistic model for camera relocalization. 2019.
- Hybrid camera pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 136–144, 2018.
- Refinement for absolute pose regression with neural feature synthesis. arXiv preprint arXiv:2303.10087, 2023.
- Dfnet: Enhance absolute pose regression with direct feature matching. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part X, pp. 1–17. Springer, 2022.
- Direct-posenet: absolute pose regression with photometric consistency. In 2021 International Conference on 3D Vision (3DV), pp. 1175–1185. IEEE, 2021.
- Data-efficient decentralized visual slam. In 2018 IEEE international conference on robotics and automation (ICRA), pp. 2466–2473. IEEE, 2018.
- Vidloc: A deep spatio-temporal model for 6-dof video-clip relocalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6856–6864, 2017.
- Deep bingham networks: Dealing with uncertainty and ambiguity in pose estimation. International Journal of Computer Vision, 130(7):1627–1654, 2022.
- Superpoint: Self-supervised interest point detection and description. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 224–236, 2018.
- Camnet: Coarse-to-fine retrieval for camera re-localization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2871–2880, 2019.
- D2-net: A trainable cnn for joint description and detection of local features. In Proceedings of the ieee/cvf conference on computer vision and pattern recognition, pp. 8092–8101, 2019.
- Real-time rgb-d camera relocalization. In 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 173–179. IEEE, 2013.
- A combined corner and edge detector. In Alvey vision conference, vol. 15, pp. 10–5244. Citeseer, 1988.
- Prior guided dropout for robust visual localization in dynamic environments. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 2791–2800, 2019.
- A. Kendall and R. Cipolla. Modelling uncertainty in deep learning for camera relocalization. In 2016 IEEE international conference on Robotics and Automation (ICRA), pp. 4762–4769. IEEE, 2016.
- A. Kendall and R. Cipolla. Geometric loss functions for camera pose regression with deep learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5974–5983, 2017.
- Posenet: A convolutional network for real-time 6-dof camera relocalization. In Proceedings of the IEEE international conference on computer vision, pp. 2938–2946, 2015.
- Camera relocalization by computing pairwise relative poses using convolutional neural network. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 929–938, 2017.
- D. G. Lowe. Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60:91–110, 2004.
- Image-based localization using hourglass networks. In Proceedings of the IEEE international conference on computer vision workshops, pp. 879–886, 2017.
- Coordinet: uncertainty-aware pose regressor for reliable vehicle localization. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2229–2238, 2022.
- Lens: Localization enhanced by nerf synthesis. In Conference on Robot Learning, pp. 1347–1356. PMLR, 2022.
- Orb-slam: a versatile and accurate monocular slam system. IEEE transactions on robotics, 31(5):1147–1163, 2015.
- R. Mur-Artal and J. D. Tardós. Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras. IEEE transactions on robotics, 33(5):1255–1262, 2017.
- T. Naseer and W. Burgard. Deep regression for monocular camera-based 6-dof global localization in outdoor environments. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1525–1530. IEEE, 2017.
- Reassessing the limitations of cnn methods for camera pose regression. arXiv preprint arXiv:2108.07260, 2021.
- Large-scale image retrieval with attentive deep local features. In Proceedings of the IEEE international conference on computer vision, pp. 3456–3465, 2017.
- Vlocnet++: Deep multitask learning for semantic visual localization and odometry. IEEE Robotics and Automation Letters, 3(4):4407–4414, 2018.
- From coarse to fine: Robust hierarchical localization at large scale. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12716–12725, 2019.
- Superglue: Learning feature matching with graph neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4938–4947, 2020.
- Large-scale location recognition and the geometric burstiness problem. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1582–1590, 2016.
- Benchmarking 6dof outdoor visual localization in changing conditions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8601–8610, 2018.
- Understanding the limitations of cnn-based absolute camera pose regression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3302–3312, 2019.
- Here to stay: A quantitative comparison of virtual object stability in markerless mobile ar. In 2022 2nd International Workshop on Cyber-Physical-Human System Design and Implementation (CPHS), pp. 24–29. IEEE, 2022.
- Structure-from-motion revisited. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4104–4113, 2016.
- Learning multi-scene absolute pose regression with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2733–2742, 2021.
- Scene coordinate regression forests for camera relocalization in rgb-d images. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2930–2937, 2013.
- Inloc: Indoor visual localization with dense matching and view synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7199–7209, 2018.
- Robust monocular slam in dynamic environments. In 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 209–218. IEEE, 2013.
- Deep auxiliary learning for visual localization and odometry. In 2018 IEEE international conference on robotics and automation (ICRA), pp. 6939–6946. IEEE, 2018.
- Image-based localization using lstms for structured feature correlation. In Proceedings of the IEEE International Conference on Computer Vision, pp. 627–637, 2017.
- Planet-photo geolocation with convolutional neural networks. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14, pp. 37–55. Springer, 2016.
- Delving deeper into convolutional neural networks for camera relocalization. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 5644–5651. IEEE, 2017.
- Improving feature-based visual localization by geometry-aided matching. arXiv preprint arXiv:2211.08712, 2022.
- A probabilistic framework for visual localization in ambiguous scenes. arXiv preprint arXiv:2301.02086, 2023.
- Reference pose generation for long-term visual localization via learned features and view synthesis. International Journal of Computer Vision, 129:821–844, 2021.
- E. Zheng and C. Wu. Structure from motion using structure-less resection. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2075–2083, 2015.
- Kfnet: Learning temporal camera relocalization using kalman filtering. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4919–4928, 2020.