DSP-SLAM: Object Oriented SLAM with Deep Shape Priors (2108.09481v2)

Published 21 Aug 2021 in cs.CV and cs.RO

Abstract: We propose DSP-SLAM, an object-oriented SLAM system that builds a rich and accurate joint map of dense 3D models for foreground objects, and sparse landmark points to represent the background. DSP-SLAM takes as input the 3D point cloud reconstructed by a feature-based SLAM system and equips it with the ability to enhance its sparse map with dense reconstructions of detected objects. Objects are detected via semantic instance segmentation, and their shape and pose is estimated using category-specific deep shape embeddings as priors, via a novel second order optimization. Our object-aware bundle adjustment builds a pose-graph to jointly optimize camera poses, object locations and feature points. DSP-SLAM can operate at 10 frames per second on 3 different input modalities: monocular, stereo, or stereo+LiDAR. We demonstrate DSP-SLAM operating at almost frame rate on monocular-RGB sequences from the Friburg and Redwood-OS datasets, and on stereo+LiDAR sequences on the KITTI odometry dataset showing that it achieves high-quality full object reconstructions, even from partial observations, while maintaining a consistent global map. Our evaluation shows improvements in object pose and shape reconstruction with respect to recent deep prior-based reconstruction methods and reductions in camera tracking drift on the KITTI dataset.

Citations (62)

View on Semantic Scholar

Summary

The paper presents DSP-SLAM, a novel object-oriented SLAM framework that integrates deep shape priors to improve object reconstruction and reduce camera drift.
It leverages ORB-SLAM2 with semantic instance segmentation and a novel second-order optimization for joint refinement of camera poses, object locations, and feature points.
The system demonstrates robust performance across monocular, stereo, and stereo+LiDAR inputs, validated on challenging datasets like KITTI, Freiburg Cars, and Redwood-OS.

DSP-SLAM: Advancements in Object-Oriented SLAM with Deep Shape Priors

This paper presents DSP-SLAM, a novel approach to Simultaneous Localization and Mapping (SLAM) that integrates deep shape priors for enhanced object reconstruction. DSP-SLAM innovatively targets the limitations of traditional SLAM systems by generating a map that represents foreground objects with high-definition complete shapes and the background with sparse landmark points. Such an approach significantly enhances the semantic understanding of a scene, crucial for advanced robotic vision applications.

DSP-SLAM leverages the capabilities of ORB-SLAM2, a feature-based SLAM framework, and integrates it with deep shape embedding techniques. This combination allows DSP-SLAM to create an enriched semantic map capable of inferring detailed object characteristics such as shape and pose, which traditional geometric-only SLAM systems cannot achieve fluently.

The system employs semantic instance segmentation for object detection, using category-specific deep shape embeddings to guide the estimation of object shapes and poses. Notably, this paper introduces a novel second-order optimization for this purpose. DSP-SLAM's object-aware bundle adjustment operates as a joint optimization framework refining camera poses, object locations, and feature points concurrently. Its flexibility is demonstrated through its ability to function across multiple input modalities, including monocular, stereo, and stereo+LiDAR.

A critical strength of DSP-SLAM lies in its practical performance results, which were evaluated in stringent conditions such as the KITTI odometry dataset. The system shows substantial improvements in object pose and shape reconstruction when benchmarked against existing methods, indicating a noteworthy reduction in camera tracking drift.

The quantifiable benefits of DSP-SLAM include its ability to run at 10 frames per second while maintaining high accuracy levels. Additionally, its integration with sparse and dense reconstruction methodologies offers rich object reconstructions even from partial observations, ensuring a consistent global map. These advantages are not only highlighted in stereo+LiDAR sequence evaluations but also in single-view scenarios with only monocular input, as demonstrated on datasets like Friburg Cars and Redwood-OS.

There are broader implications and potential developments emanating from this research. DSP-SLAM's approach could influence future AI systems requiring sophisticated semantic map representations, paving the way for more autonomous and intelligent robotic systems in dynamic environments. Future research could explore the extension of such systems to mobile platforms operating in real-time complex settings or adapting deep shape prior models that can learn and generalize faster, possibly enhancing DSP-SLAM's efficacy across diversified object categories.

The paper also opens pathways for investigating optimization algorithms within SLAM that could further merge object and feature-based map elements optimally or exploring novel data association strategies for efficient object correlation across frames. Thus, the paper serves as a pivotal contribution to progressing towards object-centric semantic SLAM methods.

PDF Markdown

Related Papers

YouTube

Show All Videos