MID-Fusion: Octree-based Object-Level Multi-Instance Dynamic SLAM (1812.07976v4)

Published 19 Dec 2018 in cs.RO and cs.CV

Abstract: We propose a new multi-instance dynamic RGB-D SLAM system using an object-level octree-based volumetric representation. It can provide robust camera tracking in dynamic environments and at the same time, continuously estimate geometric, semantic, and motion properties for arbitrary objects in the scene. For each incoming frame, we perform instance segmentation to detect objects and refine mask boundaries using geometric and motion information. Meanwhile, we estimate the pose of each existing moving object using an object-oriented tracking method and robustly track the camera pose against the static scene. Based on the estimated camera pose and object poses, we associate segmented masks with existing models and incrementally fuse corresponding colour, depth, semantic, and foreground object probabilities into each object model. In contrast to existing approaches, our system is the first system to generate an object-level dynamic volumetric map from a single RGB-D camera, which can be used directly for robotic tasks. Our method can run at 2-3 Hz on a CPU, excluding the instance segmentation part. We demonstrate its effectiveness by quantitatively and qualitatively testing it on both synthetic and real-world sequences.

Authors (6)

Binbin Xu (37 papers)
Wenbin Li (117 papers)
Dimos Tzoumanikas (7 papers)
Michael Bloesch (24 papers)
Andrew Davison (15 papers)
Stefan Leutenegger (66 papers)

Citations (175)

View on Semantic Scholar

Summary

MID-Fusion: Octree-based Object-Level Multi-Instance Dynamic SLAM

The paper "MID-Fusion: Octree-based Object-Level Multi-Instance Dynamic SLAM" presents a sophisticated approach to addressing challenges in dynamic Simultaneous Localization and Mapping (SLAM) environments. The proposed system moves beyond the traditional static environment assumption, innovating on how dynamic elements within a scene are handled through advanced RGB-D camera tracking and object-level mapping.

Key Contributions

This work makes several notable contributions to the field:

Integration of Dynamic Object Tracking: The system uniquely integrates a volumetric representation that leverages octree structures to manage memory use efficiently. It allows for the handling of multiple moving objects in dynamic indoor environments using a standard single RGB-D camera, which remains computationally feasible on standard CPU hardware at 2-3 Hz without instance segmentation.
Advanced Object Tracking Methodology: The researchers have developed a tracking method that uses a robust weighting of measurement uncertainty and incorporates re-parametrized object tracking. This method improves the resilience of camera tracking against non-static elements in the scene. Notably, the system distinguishes and tracks both static and moving objects, updating object models with geometric, semantic, and object foreground probabilities from each frame.
Probabilistic Fusion for Semantic and Geometric Information: By employing a fusion of semantic distribution and foreground object probability into octree-based object models, the system offers refined object recognition and tracking capabilities, effectively managing object identities over time despite spatial transformations or occlusions.

Numerical Results

The paper presents quantitative results, using the challenging TUM RGB-D benchmark that illustrate its effectiveness in SLAM tasks in dynamic environments. The proposed MID-Fusion system outperforms leading dense SLAM systems such as Co-Fusion and StaticFusion in these environments, as evidenced by lower Absolute Trajectory Error (ATE) values across various challenging sequences. Noteworthy is the comparison with DynaSLAM, which, despite using sparse features, serves as a formidable reference point.

Implications and Future Directions

The proposed system's ability to create detailed, dynamic maps with distinct objects lends itself directly to practical applications in robotic perception and interaction tasks in indoor environments. This capability is critical for mobile robots that must navigate and interact with ever-changing spaces. The research, by linking individual dynamic elements to volumetric models, also lays a foundation for richer environmental understanding and interaction.

Future developments may consider enhancing the real-time execution capabilities by leveraging parallel computational strategies, potentially using GPUs. Moreover, the exploration of a hybrid approach incorporating feature-based methods for improved robustness in diverse scenarios, including reflective environments or fast motion conditions, presents a logical next step. The integration of machine learning advancements for more accurate segmentation and object detection, coupled with the MID-Fusion framework, could further enhance SLAM's adaptability in complex real-world applications.

This research stands as a significant advancement in dynamic SLAM methodologies, offering both a practical solution today and a promising framework for future exploration.

PDF Markdown

Related Papers

YouTube

Show All Videos