Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 83 tok/s
Gemini 2.5 Pro 34 tok/s Pro
GPT-5 Medium 24 tok/s Pro
GPT-5 High 21 tok/s Pro
GPT-4o 130 tok/s Pro
Kimi K2 207 tok/s Pro
GPT OSS 120B 460 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Map-Relative Pose Regression for Visual Re-Localization (2404.09884v1)

Published 15 Apr 2024 in cs.CV and cs.LG

Abstract: Pose regression networks predict the camera pose of a query image relative to a known environment. Within this family of methods, absolute pose regression (APR) has recently shown promising accuracy in the range of a few centimeters in position error. APR networks encode the scene geometry implicitly in their weights. To achieve high accuracy, they require vast amounts of training data that, realistically, can only be created using novel view synthesis in a days-long process. This process has to be repeated for each new scene again and again. We present a new approach to pose regression, map-relative pose regression (marepo), that satisfies the data hunger of the pose regression network in a scene-agnostic fashion. We condition the pose regressor on a scene-specific map representation such that its pose predictions are relative to the scene map. This allows us to train the pose regressor across hundreds of scenes to learn the generic relation between a scene-specific map representation and the camera pose. Our map-relative pose regressor can be applied to new map representations immediately or after mere minutes of fine-tuning for the highest accuracy. Our approach outperforms previous pose regression methods by far on two public datasets, indoor and outdoor. Code is available: https://nianticlabs.github.io/marepo

Definition Search Book Streamline Icon: https://streamlinehq.com
References (59)
  1. Apple. ARKit. Accessed: 26 March 2024.
  2. Map-free visual relocalization: Metric pose relative to a single image. In ECCV, 2022.
  3. Neural- Guided RANSAC: Learning where to sample model hypotheses. In ICCV, 2019.
  4. Visual camera re-localization from RGB and RGB-D images using DSAC. IEEE TPAMI, 2021.
  5. DSAC - Differentiable RANSAC for Camera Localization. In CVPR, 2017.
  6. Accelerated coordinate encoding: Learning to relocalize in minutes using rgb and poses. In CVPR, 2023.
  7. Geometry-Aware Learning of Maps for Camera Localization. In CVPR, 2018.
  8. Hybrid scene compression for visual localization. In CVPR, 2019.
  9. End-to-end object detection with transformers. In ECCV, 2020.
  10. Wide-baseline relative camera pose estimation with directional learning. In CVPR, 2021a.
  11. Direct-PoseNet: Absolute pose regression with photometric consistency. In 3DV, 2021b.
  12. DFNet: Enhance absolute pose regression with direct feature matching. In ECCV, 2022.
  13. Vidloc: A deep spatio-temporal model for 6-dof video-clip relocalization. In CVPR, 2017.
  14. Visual localization via few-shot scene region classification. In 3DV, 2022.
  15. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  16. Rpnet: An end-to-end network for relative camera pose estimation. In ECCVW, 2018.
  17. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. In CACM, 1981.
  18. Complete solution classification for the perspective-three-point problem. IEEE TPAMI, 2003.
  19. Real-time rgb-d camera relocalization. In ISMAR, 2013.
  20. Google. ARCore. Accessed: 26 March 2024.
  21. Robust image retrieval-based visual localization using Kapture, 2020.
  22. Transformers are rnns: Fast autoregressive transformers with linear attention. In ICML, 2020.
  23. Modelling uncertainty in deep learning for camera relocalization. In ICRA, 2016.
  24. Geometric loss functions for camera pose regression with deep learning. In CVPR, 2017.
  25. Posenet: A convolutional network for real-time 6-dof camera relocalization. In ICCV, 2015.
  26. An analysis of svd for deep rotation estimation. NeurIPS, 2020.
  27. Hierarchical scene coordinate classification and regression for visual localization. In CVPR, pages 11983–11992, 2020.
  28. Decoupled weight decay regularization. ICLR, 2017.
  29. Large-scale, real-time visual-inertial localization revisited. International Journal of Robotics Research, 39(9), 2019.
  30. Image-based localization using hourglass networks. In ICCVW, 2017.
  31. NeRF Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
  32. LENS: Localization enhanced by nerf synthesis. In CoRL, 2021.
  33. Coordinet: uncertainty-aware pose regressor for reliable vehicle localization. In WACV, 2022.
  34. Deep regression for monocular camera-based 6-dof global localization in outdoor environments. In IROS, 2017.
  35. Meshloc: Mesh-based visual localization. In ECCV, pages 589–609. Springer, 2022.
  36. Synthetic view generation for absolute pose regression and image synthesis. In BMVC, 2018.
  37. Vlocnet++: Deep multitask learning for semantic visual localization and odometry. In IEEE Robotics and Automation Letters, 2018.
  38. From coarse to fine: Robust hierarchical localization at large scale. In CVPR, 2019.
  39. Superglue: Learning feature matching with graph neural networks. In CVPR, 2020.
  40. Back to the Feature: Learning robust camera localization from pixels to pose. In CVPR, 2021.
  41. Improving image-based localization by active correspondence search. In ECCV, 2012.
  42. Efficient & Effective Prioritized Matching for Large-Scale Image-Based Localization. In IEEE TPAMI, 2017.
  43. Understanding the limitations of cnn-based absolute camera pose regression. In CVPR, 2019.
  44. Structure-from-motion revisited. In CVPR, 2016.
  45. Camera pose auto-encoders for improving pose regression. In ECCV, 2022.
  46. Paying attention to activation maps in camera pose regression. In arXiv preprint arXiv:2103.11477, 2021a.
  47. Learning multi-scene absolute pose regression with transformers. In ICCV, 2021b.
  48. Scene coordinate regression forests for camera relocalization in rgb-d images. In CVPR, 2013.
  49. Super-convergence: Very fast training of neural networks using large learning rates. In Artificial intelligence and machine learning for multi-domain operations applications, 2019.
  50. LoFTR: Detector-free local feature matching with transformers. CVPR, 2021.
  51. Visual Camera Re-Localization Using Graph Neural Networks and Relative Pose Supervision. In 3DV, 2021.
  52. Learning to navigate the energy landscape. In 3DV. IEEE, 2016.
  53. Image-based localization using lstms for structured feature correlation. In ICCV, 2017.
  54. Atloc: Attention guided camera localization. In AAAI, 2020.
  55. Learning to localize in new environments from synthetic training data. In ICRA, 2021.
  56. Delving Deeper into Convolutional Neural Networks for Camera Relocalization. In ICRA, 2017.
  57. SANet: Scene agnostic network for camera localization. In ICCV, 2019.
  58. Is geometry enough for matching in visual localization? In ECCV, 2022.
  59. On the continuity of rotation representations in neural networks. In CVPR, 2019.
Citations (8)

Summary

  • The paper introduces a map-relative pose regression that leverages scene-specific maps to improve camera pose estimation accuracy without extensive per-scene training data.
  • The approach employs dynamic positional encoding with camera intrinsics integrated into a transformer-based regressor to enhance generalization across scenes.
  • Experimental results on 7-Scenes and Wayspots demonstrate significant accuracy and efficiency improvements over traditional APR and correspondence-based methods.

Map-Relative Pose Regression for Visual Re-Localization

Introduction

The paper "Map-Relative Pose Regression for Visual Re-Localization" introduces a novel approach to camera pose estimation by leveraging map-relative pose regression, which significantly improves over traditional absolute pose regression (APR) methods. This new methodology allows for high accuracy pose predictions by conditioning the regressor on a scene-specific map representation, enabling it to be trained across various scenes and applied to new ones with minimal fine-tuning requirements.

Background and Motivation

Visual relocalization remains a challenge in computer vision, where methods typically fall into correspondence-based or pose regression categories. The former leverages image-to-scene correspondences for accuracy but requires extensive mapping. Pose regression approaches use neural networks for efficiency but traditionally lack the accuracy of correspondence-based methods. Absolute pose regression, in particular, suffers from needing vast training data per scene, making it challenging to scale. The proposed map-relative pose regression approach addresses these limitations by integrating scene-specific maps into the pose regression framework, achieving high accuracy with reduced training times.

Methodology

Architecture Overview

The architecture consists of a scene-specific geometry prediction module G\mathcal{G} that associates image pixels with 3D scene coordinates and a scene-agnostic map-relative pose regressor M\mathcal{M}, which predicts camera poses. The approach conditions pose predictions on scene-specific maps, enabling precise pose estimates without requiring large scene-specific training datasets. Figure 1

Figure 1: Illustration of the network. A scene-specific geometry prediction module GS\mathcal{G_S} processes a query image to predict a scene coordinate map H^\hat{H}.

Dynamic Positional Encoding

The model uses a dynamic positional encoding mechanism that integrates camera intrinsics into the transformer-based regressor. This is achieved via camera-aware 2D positional embeddings and 3D positional embeddings, which help the network generalize across scenes and improve pose estimation significantly.

Loss Function and Fine-Tuning

The architecture uses an L1 regression loss on rotation and translation for training. Moreover, fine-tuning enhances the model's accuracy on specific scenes, optimizing the pre-trained map-relative regressor for tailored performance improvements with minimal additional training time.

Experimental Evaluation

The proposed method was tested against numerous state-of-the-art pose regression and correspondence-based methods on datasets like 7-Scenes and Wayspots. Figure 2

Figure 2: Camera pose estimation performance vs. mapping time. The figure shows the median translation error of several pose regression relocalization methods on the 7-Scenes dataset.

Results on 7-Scenes

The approach achieved significant accuracy improvements over existing APR methods, requiring only minutes for scene-specific training. It demonstrated competitive performance to state-of-the-art structure-from-motion methods while maintaining superior efficiency.

Results on Wayspots

For challenging outdoor scenes, the method outperformed traditional APR approaches and exhibited accuracy comparable to advanced geometry-based methods, demonstrating its robustness and scalability across diverse environments.

Conclusion

Map-relative pose regression presents a scalable, efficient alternative to traditional absolute and relative pose regression methods, leveraging scene-specific geometry to achieve high-accuracy, real-time camera pose estimation. Its integration of 3D geometric knowledge and ability to generalize across scenes effectively addresses the limitations of prior approaches, offering significant potential for deployment in dynamic and large-scale applications.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 4 posts and received 146 likes.

Youtube Logo Streamline Icon: https://streamlinehq.com

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube