Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

F$^3$Loc: Fusion and Filtering for Floorplan Localization (2403.03370v1)

Published 5 Mar 2024 in cs.CV and cs.RO

Abstract: In this paper we propose an efficient data-driven solution to self-localization within a floorplan. Floorplan data is readily available, long-term persistent and inherently robust to changes in the visual appearance. Our method does not require retraining per map and location or demand a large database of images of the area of interest. We propose a novel probabilistic model consisting of an observation and a novel temporal filtering module. Operating internally with an efficient ray-based representation, the observation module consists of a single and a multiview module to predict horizontal depth from images and fuses their results to benefit from advantages offered by either methodology. Our method operates on conventional consumer hardware and overcomes a common limitation of competing methods that often demand upright images. Our full system meets real-time requirements, while outperforming the state-of-the-art by a significant margin.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Netvlad: Cnn architecture for weakly supervised place recognitio. In CVPR, pages 5297–5307, 2016.
  2. Relocnet: Continuous metric learning relocalisation using neural nets. In ECCV, pages 751–767, 2018.
  3. Robust lidar-based localization in architectural floor plans. In IROS, pages 3318–3324, 2017.
  4. A pose graph-based localization system for long-term navigation in cad floor plans. pages 84–97, 2019a.
  5. Robot localization in floor plans using a room layout edge extraction network. In IROS, pages 5291–5297, 2019b.
  6. Dsac-differentiable ransac for camera localization. In CVPR, pages 6684–6692, 2017.
  7. Deep stereo using adaptive thin volume representation with uncertainty awareness. In CVPR, pages 2524–2534, 2020.
  8. You are here: Mimicking the human thinking process in reading floor-plans. In ICCV, pages 2210–2218, 2015.
  9. Robert T Collins. A space-sweep approach to true multi-image matching. In CVPR, pages 358–363, 1996.
  10. Monte carlo localization for mobile robots. In ICRA, pages 1322–1328, 1999.
  11. The current state and future outlook of rescue robotics. Journal of Field Robotics, 36(7):1171–1191, 2019.
  12. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In NeurIPS, pages 2650–2658, 2015.
  13. Depth map prediction from a single image using a multi-scale deep network. 2014.
  14. Unsupervised monocular depth estimation with left-right consistency. In CVPR, pages 270–279, 2017.
  15. Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
  16. Lalaloc++: Global floor plan comprehension for layout localisation in unvisited environments. In ECCV, pages 693–709, 2022.
  17. Lalaloc: Latent layout localisation in dynamic, unvisited environments. In ICCV, pages 10107–10116, 2021.
  18. W-rgb-d: floor-plan-based indoor global localization using a depth camera and wifi. In ICRA, pages 417–422, 2014.
  19. End-to-end learnable histogram filters. In Workshop on Deep Learning for Action and Interaction at NIPS, 2016.
  20. Particle filter networks with application to visual localization. In CoRL, pages 169–178, 2018.
  21. Posenet: A convolutional network for real-time 6-dof camera relocalization. In ICCV, pages 2938–2946, 2015.
  22. Imagenet classification with deep convolutional neural networks. In NeurIPS, 2012.
  23. Online localization with imprecise floor space maps using stochastic gradient descent. In IROS, pages 8571–8578.
  24. Efficient global 2d-3d matching for camera localization in a large-scale 3d map. In ICCV, pages 2372–2381, 2017.
  25. P-mvsnet: Learning patch-wise matching confidence aggregation for multi-view stereo. In ICCV, pages 10452–10461, 2019.
  26. Attention-aware multi-view stereo. In CVPR, pages 1590–1599, 2020.
  27. Sedar: Reading floorplans like a human—using deep learning to enable human-inspired localisation. IJCV, 128:1286–1310, 2020.
  28. Laser: Latent space rendering for 2d visual localization. In CVPR, pages 11122–11131, 2022.
  29. Rethinking depth estimation for multi-view stereo: A unified representation. In CVPR, pages 8645–8654, 2022.
  30. PointNet: Deep learning on point sets for 3d classification and segmentation. In CVPR, pages 652–660, 2017.
  31. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE TPAMI, 44(3):1623–1637, 2020.
  32. Vision transformers for dense prediction. In ICCV, pages 12179–12188, 2021.
  33. You are here: Geolocation by embedding maps and images. In ECCV, pages 502–518, 2020.
  34. From coarse to fine: Robust hierarchical localization at large scale. In CVPR, pages 12716–12725, 2019.
  35. Lamar: Benchmarking localization and mapping for augmented reality. In ECCV, pages 686–704, 2022.
  36. Orienternet: Visual localization in 2d public maps with neural matching. In CVPR, pages 21632–21642, 2023.
  37. Fast image-based localization using direct 2d-to-3d matching. In ICCV, pages 667–674, 2011.
  38. Improving image-based localization by active correspondence search. In ECCV, pages 752–765, 2012.
  39. Efficient & effective prioritized matching for large-scale image-based localization. PAMI, 39(9):1744–1756, 2016.
  40. City-scale location recognition. In CVPR, pages 1–7, 2007.
  41. igibson 1.0: A simulation environment for interactive tasks in large realistic scenes. In IROS.
  42. Scene coordinate regression forests for camera relocalization in rgb-d images. In CVPR, pages 2930–2937, 2013.
  43. DeepV2D: Video to depth with differentiable structure from motion. In ICLR, 2020.
  44. Exploiting uncertainty in regression forests for accurate camera relocalization. In CVPR, pages 4400–4408, 2015.
  45. The unscented particle filter. In NeurIPS, 2000.
  46. Attention is all you need. 2017.
  47. Image-based localization using lstms for structured feature correlation. In ICCV, pages 627–637, 2017.
  48. Glfp: Global localization from a floor plan. In IROS, pages 1627–1632, 2019.
  49. An introduction to the kalman filter. Technical Report 95-041, University of North Carolina at Chapel Hill, 1995.
  50. Delving deeper into convolutional neural networks for camera relocalization. In ICRA, pages 5644–5651, 2017.
  51. Visual cross-view metric localization with dense uncertainty estimates. In ECCV, pages 90–106, 2022.
  52. Mvsnet: Depth inference for unstructured multi-view stereo. In ECCV, pages 767–783, 2018.
  53. Recurrent mvsnet for high-resolution multi-view stereo depth inference. In CVPR, pages 5525–5534, 2019.
  54. Structured3d: A large photo-realistic dataset for structured 3d modeling. In ECCV, pages 519–535, 2020.
  55. Deeptam: Deep tracking and mapping. In ECCV, pages 822–838, 2018.
  56. Vigor: Cross-view image geo-localization beyond one-to-one retrieval. In CVPR, pages 3640–3649, 2021.
Citations (3)

Summary

  • The paper introduces a probabilistic model that fuses single and multi-view depth cues with a novel SE2 histogram filter for efficient indoor localization.
  • It leverages consumer hardware and a virtual roll-pitch augmentation technique to robustly handle non-upright images in diverse conditions.
  • Experimental results highlight superior accuracy and efficiency over traditional 3D model-based methods for indoor navigation and robotic applications.

Efficient Data-Driven Localization within Floorplans Using Fusion, Filtering, and Consumer Hardware

Introduction

Camera localization within known environments has been a longstanding challenge in both the computer vision and robotics communities. Traditional approaches rely heavily on pre-existing databases or 3D models, which can be cumbersome in terms of storage and maintenance. Given the ubiquity of floorplans in indoor spaces, leveraging them for camera localization presents a promising, lightweight alternative. This paper introduces F3^3Loc: a novel, probabilistic model for efficient floorplan localization. Eschewing the need for upright images and heavyweight computing resources, F3^3Loc combines single and multi-view imagery with a novel temporal filtering approach, running on conventional consumer hardware.

The F3^3Loc Framework

The proposed F3^3Loc system consists of several key components designed to address the challenges of localizing within a floorplan. These include a data-driven observation model that integrates single and multi-view depth predictions, a selection network to fuse these cues based on their relative strengths, and an efficient SE2 histogram filter for temporal integration.

  1. Single Image Localization: Utilizing a combination of ResNet and Attention-based networks, F3^3Loc extracts depth from single images, aligning them with gravity direction to predict floorplan depth. This component helps tackle scale ambiguity common in monocular depth estimation.
  2. Multiview Stereo Estimation: Taking advantage of multiple views, F3^3Loc employs a variant of the MVS network to capture geometric cues, fundamentally improving depth estimation. This approach excels where there is sufficient baseline and image overlap but struggles with small baselines and near-in-place motion.
  3. Complementary Cue Selection: Realizing the unique advantages and disadvantages of single and multi-view cues, F3^3Loc incorporates a selection network that intelligently combines the two based on their relevance to the current situation. This facilitates the leveraging of either method's strengths as required.
  4. Temporal Localization: To refine localization over time and resolve ambiguities, F3^3Loc integrates single-frame predictions using a novel SE2 histogram filter. This efficient algorithm maintains a probability distribution over poses and makes use of known ego-motion to update these probabilities robustly.
  5. Robustness to Non-Upright Images: Addressing the practical limitation of requiring upright images, F3^3Loc introduces a virtual roll-pitch augmentation technique for its training process. This significantly enhances the model's robustness to varied camera orientations, aligning with practical usage scenarios more closely.

Practical Implications and Future Outlook

F3^3Loc sets a new standard for indoor localization against a floorplan, outperforming leading methods in rapidity, accuracy, and practical viability. By running efficiently on consumer hardware and accommodating non-upright images, it demonstrates superior real-world applicability.

This research not only contributes a robust solution to the floorplan localization challenge but also opens avenues for future work, particularly in the incorporation of semantic cues and the development of real-world datasets for further validation.

Considering the system's potential for broad application in augmented reality (AR), virtual reality (VR), and robotics, F3^3Loc represents a significant step forward. Its methodology supports the vision of creating more intuitive indoor navigation systems and autonomous exploration and rescue robots capable of operating in complex environments reliably.

Looking ahead, the integration of semantic information and the improvement of dataset diversity and realism stand out as promising directions. As indoor localization technology continues to evolve, systems like F3^3Loc pave the way for a future where digital intelligence seamlessly navigates and understands the physical world.

Conclusion

In summary, F3^3Loc introduces an innovative, probabilistic model for efficient and accurate indoor localization within floorplans, leveraging fusion and filtering techniques to operate effectively on consumer hardware. This system's adaptability to non-upright images and its combination of single and multi-view depth cues for real-time localization mark a noteworthy advancement in the field, promising enhanced capabilities for AR/VR applications and autonomous indoor navigation.

X Twitter Logo Streamline Icon: https://streamlinehq.com