Khronos: A Unified Approach for Spatio-Temporal Metric-Semantic SLAM in Dynamic Environments
Abstract: Perceiving and understanding highly dynamic and changing environments is a crucial capability for robot autonomy. While large strides have been made towards developing dynamic SLAM approaches that estimate the robot pose accurately, a lesser emphasis has been put on the construction of dense spatio-temporal representations of the robot environment. A detailed understanding of the scene and its evolution through time is crucial for long-term robot autonomy and essential to tasks that require long-term reasoning, such as operating effectively in environments shared with humans and other agents and thus are subject to short and long-term dynamics. To address this challenge, this work defines the Spatio-temporal Metric-semantic SLAM (SMS) problem, and presents a framework to factorize and solve it efficiently. We show that the proposed factorization suggests a natural organization of a spatio-temporal perception system, where a fast process tracks short-term dynamics in an active temporal window, while a slower process reasons over long-term changes in the environment using a factor graph formulation. We provide an efficient implementation of the proposed spatio-temporal perception approach, that we call Khronos, and show that it unifies exiting interpretations of short-term and long-term dynamics and is able to construct a dense spatio-temporal map in real-time. We provide simulated and real results, showing that the spatio-temporal maps built by Khronos are an accurate reflection of a 3D scene over time and that Khronos outperforms baselines across multiple metrics. We further validate our approach on two heterogeneous robots in challenging, large-scale real-world environments.
- Dynaslam: Tracking, mapping, and inpainting in dynamic scenes. IEEE Robotics and Automation Letters, 3(4):4076–4083, 2018.
- Dynaslam ii: Tightly-coupled multi-object tracking and slam. IEEE Robotics and Automation Letters, 6:5191–5198, 2020.
- Detection and tracking of general movable objects in large three-dimensional maps. IEEE Transactions on Robotics, 35(1):231–247, Feb 2019.
- Probabilistic data association for semantic slam. pages 1722–1729, 2017.
- Semantic monocular SLAM for highly dynamic environments. In IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), pages 393–400, 2018.
- Submap-based pose-graph visual slam: A robust visual exploration and localization system. pages 6851–6856, 2018.
- Suma++: Efficient lidar-based semantic slam. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, November 2019.
- L. Cui and C. Ma. SOF-SLAM: A semantic visual SLAM for dynamic environments. IEEE Access, 7:166528–166539, 2019.
- borglab/gtsam, May 2022. URL https://github.com/borglab/gtsam).
- J. et al. Fu. Planesdf-based change detection for long-term dense mapping. IEEE Robotics and Automation Letters (RA-L), 7(4):9667–9674, Oct 2022.
- Robust change detection based on neural descriptor fields. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), page 2817–2824, October 2022. IROS.
- Conceptgraphs: Open-vocabulary 3d scene graphs for perception and planning. 2024.
- Detectfusion: Detecting and segmenting both known and unknown dynamic objects in real-time slam, 2019.
- Dynamic slam: The need for speed. In 2020 IEEE International Conference on Robotics and Automation (ICRA), page 2123–2129, May 2020. ICRA.
- Bodyslam: Joint camera localisation, mapping, and human motion tracking. page 656–673, 2022.
- Foundations of spatial perception for robotics: Hierarchical representations and real-time systems. Intl. J. of Robotics Research, 2024.
- OneFormer: One Transformer to Rule Universal Image Segmentation. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2023.
- 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 42(4), July 2023.
- Segment anything. pages 3992–4003, 2023.
- 3ds-slam: A 3d object detection based semantic slam towards dynamic indoor environments. 2023.
- Robust and efficient object change detection by combining global semantic information and local geometric verification. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 8453–8460, 2020.
- Voxel field fusion for 3d object detection. 2022.
- Rigidfusion: Robot localisation and mapping in environments with large dynamic rigid objects. IEEE Robotics and Automation Letters (RA-L), 6(2):3703–3710, April 2021.
- 3d vsg: Long-term semantic scene change prediction through 3d variable scene graphs. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 8179–8186. IEEE, 2023.
- Loc-NeRF: Monte carlo localization using neural radiance fields. In IEEE Intl. Conf. on Robotics and Automation (ICRA), 2022.
- Gaussian splatting slam, 2023.
- Fusion++: Volumetric object-level SLAM. In Intl. Conf. on 3D Vision (3DV), pages 32–41, 2018.
- Building volumetric beliefs for dynamic environments exploiting map-based moving object segmentation. IEEE Robotics and Automation Letters, PP:1–8, 08 2023.
- Probabilistic data association for semantic slam at scale. pages 4359–4364, 2022.
- Changesim: Towards end-to-end online scene change detection in industrial indoor environments. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 8578–8585, Sep 2021.
- Pocd: Probabilistic object-level change detection and volumetric mapping in semi-static scenes. In Robotics: Science and Systems (RSS), Jul 2022.
- Pov-slam: Probabilistic object-aware variational slam in semi-static environments. In Robotics: Science and Systems (RSS), 2023.
- Airdos: Dynamic slam benefits from articulated objects. In 2022 International Conference on Robotics and Automation (ICRA). IEEE, May 2022.
- Learning transferable visual models from natural language supervision. arXiv, 2021.
- Visual-inertial multi-instance dynamic slam with object-level relocalisation. In IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), pages 11055–11062, 2022.
- Kimera: an open-source library for real-time metric-semantic localization and mapping. In IEEE Intl. Conf. on Robotics and Automation (ICRA), 2020.
- Kimera: from SLAM to spatial perception with 3D dynamic scene graphs. Intl. J. of Robotics Research, 40(12–14):1510–1546, 2021.
- NeRF-SLAM: Real-time dense monocular SLAM with neural radiance fields. In IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), 2023.
- M. Rünz and L. Agapito. Co-fusion: Real-time segmentation, tracking and fusion of multiple objects. In IEEE Intl. Conf. on Robotics and Automation (ICRA), pages 4471–4478. IEEE, 2017.
- Panoptic multi-TSDFs: a flexible representation for online multi-resolution volumetric mapping and long-term dynamic scene consistency. In IEEE Intl. Conf. on Robotics and Automation (ICRA), pages 8018–8024, 2022.
- Dynablox: Real-time detection of diverse dynamic objects in complex environments. IEEE Robotics and Automation Letters (RA-L), 8(10):6259–6266, 2023.
- Rgb-d semantic segmentation and label-oriented voxelgrid fusion for accurate 3d semantic mapping. IEEE Transactions on Circuits and Systems for Video Technology, 32(1):183–197, 2022a.
- Meshnet-sp: A semantic urban 3d mesh segmentation network with sparse prior. IEEE Transactions on Circuits and Systems for Video Technology, 32(1):183–197, 2022b.
- J. C. V. et al. Soares. Visual localization and mapping in dynamic and changing environments. arXiv, Sep 2022.
- Dynavins: A visual-inertial slam for dynamic environments. IEEE Robotics and Automation Letters, 7(4):11523–11530, Oct 2022.
- Voxelnextfusion: A simple, unified and effective voxel fusion framework for multi-modal 3d object detection. IEEE Transactions on Geoscience and Remote Sensing, 61:1–12, 2023.
- Em-fusion: Dynamic object-level slam with probabilistic data association. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, October 2019.
- Nothing stands still: A spatiotemporal benchmark on 3d point cloud registration under large geometric and temporal change. 2023.
- MID-Fusion: Octree-based object-level multi-instance dynamic SLAM. pages 5231–5237, 2019.
- Learning to complete object shapes for object-level mapping in dynamic scenes. In IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), page 2257–2264, October 2022. IROS.
- Gs-slam: Dense visual slam with 3d gaussian splatting, 2024.
- Graduated non-convexity for robust spatial perception: From non-minimal solvers to global outlier rejection. IEEE Robotics and Automation Letters (RA-L), 5(2):1127–1134, 2020.
- C. et al. Yu. Ds-slam: A semantic visual slam towards dynamic environments. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1168–1174, Oct 2018.
- Fusing semantic segmentation and object detection for visual slam in dynamic scenes. In Proceedings of the 27th ACM Symposium on Virtual Reality Software and Technology, VRST ’21, New York, NY, USA, 2021. Association for Computing Machinery.
- A slam map restoration algorithm based on submaps and an undirected connected graph. IEEE Access, 9:12657–12674, 2021.
- Living scenes: Multi-object relocalization and reconstruction in changing 3d environments. 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.