Papers
Topics
Authors
Recent
Search
2000 character limit reached

Khronos: A Unified Approach for Spatio-Temporal Metric-Semantic SLAM in Dynamic Environments

Published 21 Feb 2024 in cs.RO | (2402.13817v2)

Abstract: Perceiving and understanding highly dynamic and changing environments is a crucial capability for robot autonomy. While large strides have been made towards developing dynamic SLAM approaches that estimate the robot pose accurately, a lesser emphasis has been put on the construction of dense spatio-temporal representations of the robot environment. A detailed understanding of the scene and its evolution through time is crucial for long-term robot autonomy and essential to tasks that require long-term reasoning, such as operating effectively in environments shared with humans and other agents and thus are subject to short and long-term dynamics. To address this challenge, this work defines the Spatio-temporal Metric-semantic SLAM (SMS) problem, and presents a framework to factorize and solve it efficiently. We show that the proposed factorization suggests a natural organization of a spatio-temporal perception system, where a fast process tracks short-term dynamics in an active temporal window, while a slower process reasons over long-term changes in the environment using a factor graph formulation. We provide an efficient implementation of the proposed spatio-temporal perception approach, that we call Khronos, and show that it unifies exiting interpretations of short-term and long-term dynamics and is able to construct a dense spatio-temporal map in real-time. We provide simulated and real results, showing that the spatio-temporal maps built by Khronos are an accurate reflection of a 3D scene over time and that Khronos outperforms baselines across multiple metrics. We further validate our approach on two heterogeneous robots in challenging, large-scale real-world environments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Dynaslam: Tracking, mapping, and inpainting in dynamic scenes. IEEE Robotics and Automation Letters, 3(4):4076–4083, 2018.
  2. Dynaslam ii: Tightly-coupled multi-object tracking and slam. IEEE Robotics and Automation Letters, 6:5191–5198, 2020.
  3. Detection and tracking of general movable objects in large three-dimensional maps. IEEE Transactions on Robotics, 35(1):231–247, Feb 2019.
  4. Probabilistic data association for semantic slam. pages 1722–1729, 2017.
  5. Semantic monocular SLAM for highly dynamic environments. In IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), pages 393–400, 2018.
  6. Submap-based pose-graph visual slam: A robust visual exploration and localization system. pages 6851–6856, 2018.
  7. Suma++: Efficient lidar-based semantic slam. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, November 2019.
  8. L. Cui and C. Ma. SOF-SLAM: A semantic visual SLAM for dynamic environments. IEEE Access, 7:166528–166539, 2019.
  9. borglab/gtsam, May 2022. URL https://github.com/borglab/gtsam).
  10. J. et al. Fu. Planesdf-based change detection for long-term dense mapping. IEEE Robotics and Automation Letters (RA-L), 7(4):9667–9674, Oct 2022.
  11. Robust change detection based on neural descriptor fields. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), page 2817–2824, October 2022. IROS.
  12. Conceptgraphs: Open-vocabulary 3d scene graphs for perception and planning. 2024.
  13. Detectfusion: Detecting and segmenting both known and unknown dynamic objects in real-time slam, 2019.
  14. Dynamic slam: The need for speed. In 2020 IEEE International Conference on Robotics and Automation (ICRA), page 2123–2129, May 2020. ICRA.
  15. Bodyslam: Joint camera localisation, mapping, and human motion tracking. page 656–673, 2022.
  16. Foundations of spatial perception for robotics: Hierarchical representations and real-time systems. Intl. J. of Robotics Research, 2024.
  17. OneFormer: One Transformer to Rule Universal Image Segmentation. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2023.
  18. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 42(4), July 2023.
  19. Segment anything. pages 3992–4003, 2023.
  20. 3ds-slam: A 3d object detection based semantic slam towards dynamic indoor environments. 2023.
  21. Robust and efficient object change detection by combining global semantic information and local geometric verification. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 8453–8460, 2020.
  22. Voxel field fusion for 3d object detection. 2022.
  23. Rigidfusion: Robot localisation and mapping in environments with large dynamic rigid objects. IEEE Robotics and Automation Letters (RA-L), 6(2):3703–3710, April 2021.
  24. 3d vsg: Long-term semantic scene change prediction through 3d variable scene graphs. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 8179–8186. IEEE, 2023.
  25. Loc-NeRF: Monte carlo localization using neural radiance fields. In IEEE Intl. Conf. on Robotics and Automation (ICRA), 2022.
  26. Gaussian splatting slam, 2023.
  27. Fusion++: Volumetric object-level SLAM. In Intl. Conf. on 3D Vision (3DV), pages 32–41, 2018.
  28. Building volumetric beliefs for dynamic environments exploiting map-based moving object segmentation. IEEE Robotics and Automation Letters, PP:1–8, 08 2023.
  29. Probabilistic data association for semantic slam at scale. pages 4359–4364, 2022.
  30. Changesim: Towards end-to-end online scene change detection in industrial indoor environments. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 8578–8585, Sep 2021.
  31. Pocd: Probabilistic object-level change detection and volumetric mapping in semi-static scenes. In Robotics: Science and Systems (RSS), Jul 2022.
  32. Pov-slam: Probabilistic object-aware variational slam in semi-static environments. In Robotics: Science and Systems (RSS), 2023.
  33. Airdos: Dynamic slam benefits from articulated objects. In 2022 International Conference on Robotics and Automation (ICRA). IEEE, May 2022.
  34. Learning transferable visual models from natural language supervision. arXiv, 2021.
  35. Visual-inertial multi-instance dynamic slam with object-level relocalisation. In IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), pages 11055–11062, 2022.
  36. Kimera: an open-source library for real-time metric-semantic localization and mapping. In IEEE Intl. Conf. on Robotics and Automation (ICRA), 2020.
  37. Kimera: from SLAM to spatial perception with 3D dynamic scene graphs. Intl. J. of Robotics Research, 40(12–14):1510–1546, 2021.
  38. NeRF-SLAM: Real-time dense monocular SLAM with neural radiance fields. In IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), 2023.
  39. M. Rünz and L. Agapito. Co-fusion: Real-time segmentation, tracking and fusion of multiple objects. In IEEE Intl. Conf. on Robotics and Automation (ICRA), pages 4471–4478. IEEE, 2017.
  40. Panoptic multi-TSDFs: a flexible representation for online multi-resolution volumetric mapping and long-term dynamic scene consistency. In IEEE Intl. Conf. on Robotics and Automation (ICRA), pages 8018–8024, 2022.
  41. Dynablox: Real-time detection of diverse dynamic objects in complex environments. IEEE Robotics and Automation Letters (RA-L), 8(10):6259–6266, 2023.
  42. Rgb-d semantic segmentation and label-oriented voxelgrid fusion for accurate 3d semantic mapping. IEEE Transactions on Circuits and Systems for Video Technology, 32(1):183–197, 2022a.
  43. Meshnet-sp: A semantic urban 3d mesh segmentation network with sparse prior. IEEE Transactions on Circuits and Systems for Video Technology, 32(1):183–197, 2022b.
  44. J. C. V. et al. Soares. Visual localization and mapping in dynamic and changing environments. arXiv, Sep 2022.
  45. Dynavins: A visual-inertial slam for dynamic environments. IEEE Robotics and Automation Letters, 7(4):11523–11530, Oct 2022.
  46. Voxelnextfusion: A simple, unified and effective voxel fusion framework for multi-modal 3d object detection. IEEE Transactions on Geoscience and Remote Sensing, 61:1–12, 2023.
  47. Em-fusion: Dynamic object-level slam with probabilistic data association. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, October 2019.
  48. Nothing stands still: A spatiotemporal benchmark on 3d point cloud registration under large geometric and temporal change. 2023.
  49. MID-Fusion: Octree-based object-level multi-instance dynamic SLAM. pages 5231–5237, 2019.
  50. Learning to complete object shapes for object-level mapping in dynamic scenes. In IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), page 2257–2264, October 2022. IROS.
  51. Gs-slam: Dense visual slam with 3d gaussian splatting, 2024.
  52. Graduated non-convexity for robust spatial perception: From non-minimal solvers to global outlier rejection. IEEE Robotics and Automation Letters (RA-L), 5(2):1127–1134, 2020.
  53. C. et al. Yu. Ds-slam: A semantic visual slam towards dynamic environments. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1168–1174, Oct 2018.
  54. Fusing semantic segmentation and object detection for visual slam in dynamic scenes. In Proceedings of the 27th ACM Symposium on Virtual Reality Software and Technology, VRST ’21, New York, NY, USA, 2021. Association for Computing Machinery.
  55. A slam map restoration algorithm based on submaps and an undirected connected graph. IEEE Access, 9:12657–12674, 2021.
  56. Living scenes: Multi-object relocalization and reconstruction in changing 3d environments. 2023.
Citations (13)

Summary

  • The paper introduces Khronos, a novel framework that factorizes local dynamic tracking from global map optimization to address spatio-temporal metric‐semantic SLAM in dynamic settings.
  • It employs a three-tiered approach with an active window for local estimation, deformation graphs for global optimization, and a reconciliation process to update historical beliefs.
  • Experimental evaluations show significant gains in precision, recall, and F1 scores, underscoring its potential for robust autonomous robotics in human-centric environments.

Overview of "Khronos: A Unified Approach for Spatio-Temporal Metric-Semantic SLAM in Dynamic Environments"

The paper presents a novel framework termed Khronos, designed for Spatio-Temporal Metric-Semantic Simultaneous Localization and Mapping (SLAM) in dynamic environments. This research addresses the complexities of enabling robotic systems to perceive and interact with both short-term dynamics and long-term changes in their surroundings.

Problem Formulation

The authors introduce the Spatio-temporal Metric-Semantic SLAM (SMS) problem, which involves building and maintaining a real-time, metric-semantic map of the world as observed by robots navigating through dynamic spaces. The SMS problem is tackled by a strategic factorization that separates local dynamic tracking from global environmental changes, thus facilitating efficient and scalable solutions.

Methodology

The proposed solution involves a three-tiered approach:

  1. Active Window for Local Estimation: The active window continuously estimates object fragments, capturing local temporal dynamics. It processes incoming sensor data to build a representation of immediate short-term dynamics, such as moving objects.
  2. Global Optimization: This component uses a deformation graph to maintain global spatial consistency. Robot poses, fragments, and background meshes are jointly optimized. The optimization uses Truncated Least Squares to handle erroneous measurements and enhance robustness.
  3. Reconciliation: This step assesses evidence from the map and updates historical beliefs, allowing the system to reconcile and update the understanding of scene changes over time.

Experimental Validation

Numerical evaluations in both simulated and real-world settings demonstrate that Khronos effectively outperforms existing methods in terms of capturing short-term dynamics and long-term changes. The results indicate significant improvements in precision, recall, and F1 scores across diverse environments, validating the capability of the proposed method to maintain spatial-temporal consilience.

Implications

The results have numerous practical implications in robotics, particularly for autonomous systems operating in human-centric environments, such as service robots and industrial automation. Theoretically, Khronos opens avenues for further exploration of integrated perception systems capable of handling real-time dynamics and changes comprehensively.

Future Directions

Future developments could aim at refining object pose estimation, enhancing scalability, and expanding capabilities to environments with sparse spatial features. Additionally, integration with advanced semantic segmentation techniques could further improve performance in various applications.

Conclusion

Khronos presents a significant advancement in the field of dynamic SLAM, offering a unified, efficient approach to managing complex environmental interactions in autonomous robotics. Its innovative factorization strategy and real-time capabilities pave the way for more robust and intelligent robotic systems capable of long-term autonomous operation in dynamic human-populated spaces.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 3 tweets with 246 likes about this paper.