Semantic 3D Occupancy Mapping through Efficient High Order CRFs (1707.07388v1)

Published 24 Jul 2017 in cs.CV and cs.RO

Abstract: Semantic 3D mapping can be used for many applications such as robot navigation and virtual interaction. In recent years, there has been great progress in semantic segmentation and geometric 3D mapping. However, it is still challenging to combine these two tasks for accurate and large-scale semantic mapping from images. In the paper, we propose an incremental and (near) real-time semantic mapping system. A 3D scrolling occupancy grid map is built to represent the world, which is memory and computationally efficient and bounded for large scale environments. We utilize the CNN segmentation as prior prediction and further optimize 3D grid labels through a novel CRF model. Superpixels are utilized to enforce smoothness and form robust P N high order potential. An efficient mean field inference is developed for the graph optimization. We evaluate our system on the KITTI dataset and improve the segmentation accuracy by 10% over existing systems.

Citations (74)

View on Semantic Scholar

Summary

The paper presents an incremental semantic mapping system using a scrolling occupancy grid that adapts to large-scale environments.
It leverages high order CRFs with superpixels and efficient mean field inference to enforce semantic consistency.
The system achieves a 10% improvement in segmentation accuracy on the KITTI dataset, enhancing detailed structure detection for navigation.

Semantic 3D Occupancy Mapping through Efficient High Order CRFs: A Review

The paper introduces a system for semantic 3D mapping that addresses the challenging integration of semantic segmentation and geometric 3D mapping. The authors present an incremental, near real-time framework that constructs a semantic map of large-scale environments, which is both memory and computationally efficient.

Key Contributions

Incremental Semantic Mapping System: The authors propose a unique system that incrementally builds a 3D semantic map using a scrolling occupancy grid. This novel representation is independent of the size of the environment, which differentiates it from existing offline or non-incremental approaches.
Efficient Use of High Order CRFs: A Conditional Random Field (CRF) model with higher order cliques is introduced. It leverages superpixels to enforce semantic consistency. The authors develop an efficient mean field inference method for this CRF, which optimizes the 3D grid labels based on initial predictions from a convolutional neural network (CNN).
Improved Segmentation Accuracy: The system demonstrates a noteworthy 10% improvement in segmentation accuracy on the KITTI dataset over existing systems. This result highlights the efficacy of their novel approach in enhancing segmentation precision.

Technical Approach

Geometric Reconstruction: The system incorporates a 3D geometric reconstruction using stereo disparity estimation and camera pose information. Occupancy grids are employed to represent the environment, storing not only occupancy but also color and label distributions. This is a departure from conventional sparse mapping techniques.
Hierarchical CRF Model: The hierarchical CRF model is a particularly innovative component, designed to address spatial consistency in segmentation. The high order potentials modeled by robust $P^N$ Potts represent region-based homogeneity in labeling, which are crucial in regularizing labels in large-scale environments.
Inference Mechanism: The paper provides an efficient inference strategy that approximates the posterior distribution using mean field inference. This method allows for reasonable computation times even as the number of variables grows, making it scalable for large environments.

Numerical Results and Implications

The system's evaluation on the KITTI dataset underscores its superior performance against contemporary systems. Notably, improvements in fences and pole segmentation accuracy denote the system's ability to capture detailed structures, beneficial for applications requiring detailed environmental understanding, such as autonomous navigation.

Theoretical and Practical Implications

The development of a semantic mapping system that effectively combines the strengths of convolutional networks with CRFs offers substantial improvements in mapping accuracy and efficiency. The theoretical implication is significant, as it suggests a viable method for real-time navigation in autonomous vehicles and robotic applications. Practically, it paves the way for advancements in robotics and augmented reality, where precise and fast environmental mapping is critical.

Future Prospects

Future research could focus on enhancing computational efficiency further, possibly through GPU acceleration or optimized grid configurations. There is also potential in extending this system to incorporate high-level abstract representations of the environment, which could further transform autonomous systems' perception abilities.

In conclusion, the presented work is a considerable step towards real-time, large-scale semantic mapping, with compelling advancements in both methodological innovation and practical application.

PDF Markdown

Related Papers

YouTube

Show All Videos