Co-Fusion: Real-time Segmentation, Tracking and Fusion of Multiple Objects (1706.06629v1)

Published 20 Jun 2017 in cs.CV

Abstract: In this paper we introduce Co-Fusion, a dense SLAM system that takes a live stream of RGB-D images as input and segments the scene into different objects (using either motion or semantic cues) while simultaneously tracking and reconstructing their 3D shape in real time. We use a multiple model fitting approach where each object can move independently from the background and still be effectively tracked and its shape fused over time using only the information from pixels associated with that object label. Previous attempts to deal with dynamic scenes have typically considered moving regions as outliers, and consequently do not model their shape or track their motion over time. In contrast, we enable the robot to maintain 3D models for each of the segmented objects and to improve them over time through fusion. As a result, our system can enable a robot to maintain a scene description at the object level which has the potential to allow interactions with its working environment; even in the case of dynamic scenes.

Citations (203)

View on Semantic Scholar

Summary

The paper introduces Co-Fusion, a dense SLAM system that dynamically segments, tracks, and fuses multiple moving objects in real-time.
It employs a dual-approach using motion and semantic cues along with a surfel-based fusion algorithm to enhance 3D reconstruction fidelity.
Experimental results show low RMSE in trajectory estimation and high IoU scores, validating its robust performance in both synthetic and real-world settings.

Overview of Co-Fusion: Real-time Segmentation, Tracking, and Fusion of Multiple Objects

The presented paper introduces "Co-Fusion," an advanced dense SLAM system which integrates real-time processes to segment, track, and fuse the geometric data of multiple objects within dynamic scenes. This innovation leverages a live stream of RGB-D inputs to provide comprehensive 3D models of both independent objects and static backgrounds, offering considerable potential for robotic applications where interactions with dynamic environments are required.

Key Insights

Co-Fusion distinguishes itself by addressing a crucial limitation observed in typical SLAM systems, which traditionally regard moving objects as outliers. Instead, Co-Fusion is adept at maintaining dynamic 3D models for individual objects, effectively tracking their motion and enhancing their geometric fidelity over time through iterative fusion. Such capabilities are facilitated by a dual approach to segmentation, employing both motion and semantic cues, and allowing for flexible detection strategies suitable for various robotic scenarios.

The system follows a multi-threaded approach utilizing a surfel-based fusion algorithm, which enables the real-time update of 3D models associated with each independently moving object. This presents a methodical advance over existing methodologies by efficiently coupling dense 3D reconstruction with continuous motion tracking.

Experimental Evaluation

The system was tested on multiple fronts, using both synthetic data and real-world scenarios with known ground truths. Notable empirical results include accurate pose estimation indicated by low root-mean-square errors (RMSE) in trajectory assessments for synthetic sequences, showcasing the robustness of Co-Fusion's tracking capabilities even in dynamic environments. Furthermore, quantitative evaluations demonstrated the precision of motion segmentation through high intersection-over-union (IoU) scores, affirming the system's effectiveness in labeling distinct moving entities.

In real-world scenarios, the 3D reconstruction errors for objects captured were minimal, suggesting a reliable transferability of Co-Fusion's algorithms to practical applications. These results highlight its aptitude for both autonomous navigation and real-time interactive robotics, elevating the potential for sophisticated object manipulation and environmental interaction.

Implications and Future Prospects

The implications of Co-Fusion extend significantly into the field of autonomous and interactive robotics, implying that sophisticated object-level understanding is achievable in real-time, even amidst background variability and dynamic object interaction. The robust framework set forth by Co-Fusion could be pivotal for advancing object recognition modules in self-driving cars or enhancing the dexterity of robots in manufacturing and service tasks.

Future developments in AI could see the integration of more sophisticated semantic segmentation networks within the SLAM framework. Enhancements in machine learning models tailored for real-time applications may expand the system's current capabilities, addressing complex scenarios involving high-speed dynamics or denser object populations. Furthermore, collaborative efforts across robotics disciplines could explore synergies in leveraging Co-Fusion's capabilities for more intricate tasks requiring nuanced environmental interpretations and manipulative precision.

PDF Markdown