CubeSLAM: Monocular 3D Object SLAM (1806.00557v2)

Published 1 Jun 2018 in cs.RO and cs.CV

Abstract: We present a method for single image 3D cuboid object detection and multi-view object SLAM in both static and dynamic environments, and demonstrate that the two parts can improve each other. Firstly for single image object detection, we generate high-quality cuboid proposals from 2D bounding boxes and vanishing points sampling. The proposals are further scored and selected based on the alignment with image edges. Secondly, multi-view bundle adjustment with new object measurements is proposed to jointly optimize poses of cameras, objects and points. Objects can provide long-range geometric and scale constraints to improve camera pose estimation and reduce monocular drift. Instead of treating dynamic regions as outliers, we utilize object representation and motion model constraints to improve the camera pose estimation. The 3D detection experiments on SUN RGBD and KITTI show better accuracy and robustness over existing approaches. On the public TUM, KITTI odometry and our own collected datasets, our SLAM method achieves the state-of-the-art monocular camera pose estimation and at the same time, improves the 3D object detection accuracy.

Authors (2)

Shichao Yang (11 papers)
Sebastian Scherer (163 papers)

Citations (357)

View on Semantic Scholar

Summary

Overview of CubeSLAM: Monocular 3D Object SLAM

The paper presents a novel system called CubeSLAM, which integrates single-image 3D cuboid object detection and multi-view object-based simultaneous localization and mapping (SLAM) using monocular cameras. This system is designed to function in both static and dynamic environments. The authors articulate an advanced approach where object detection and SLAM mutually enhance each other, showcasing improvements in performance metrics over existing approaches.

Key Contributions

Single Image 3D Object Detection: The paper introduces a method to generate high-quality 3D cuboid proposals from 2D bounding boxes. This involves leveraging vanishing points and aligning proposals with image edges. Such an approach allows for efficient and accurate object detection that is independent of prior object models.
Object SLAM: In the proposed multi-view SLAM framework, new object measurements are integrated into the bundle adjustment process, optimizing the camera and object poses as well as the 3D points. This integration provides significant geometric and scale constraints, which improve camera pose estimation and reduce monocular drift, notably without needing loop closure techniques.
Handling of Dynamic Environments: Unlike traditional SLAM systems that treat dynamic regions as outliers, CubeSLAM uses object representations and motion model constraints to enhance camera pose estimation even in dynamic scenarios.
Experimental Validation: The system's validity is demonstrated through experiments on datasets including SUN RGBD, KITTI, and TUM, showing improved monocular camera pose estimation and 3D object detection accuracy.

Experimental Results

The experiments reveal that CubeSLAM achieves superior accuracy in 3D object detection and camera pose estimation compared to existing methods. For instance, the 3D detection results on the SUN RGBD and KITTI datasets indicate a notable improvement in accuracy and robustness. The KITTI odometry and their own collected datasets further demonstrate CubeSLAM's state-of-the-art performance in monocular pose estimation.

Implications and Future Directions

From a theoretical perspective, CubeSLAM underscores the possibility of integrating semantic information (objects) with geometric mapping (camera poses and structure) in a SLAM framework, enhancing the understanding of scenes beyond point clouds. Practically, this integrated object SLAM capability opens up new avenues for applications in autonomous driving and augmented reality, where real-time environmental mapping and object detection are crucial.

Looking ahead, further exploration could entail extending CubeSLAM to incorporate more complex scene understanding tasks and enhancing robustness to varying lighting or environmental conditions. Additionally, optimizing processing time to ensure real-time applicability across diverse platforms would be a worthwhile endeavor.

In summary, CubeSLAM advances the fusion of object detection with SLAM processes, marking a significant step in monocular sensor utilization for complex environmental mapping in static and dynamic settings alike. Such integration not only funds improvements in SLAM accuracy but also provides a more comprehensive framework for leveraging semantic information in robotic vision systems.

PDF Markdown

Related Papers

YouTube

Show All Videos