SqueezeSeg: Convolutional Neural Nets with Recurrent CRF for Real-Time Road-Object Segmentation from 3D LiDAR Point Cloud (1710.07368v1)

Published 19 Oct 2017 in cs.CV

Abstract: In this paper, we address semantic segmentation of road-objects from 3D LiDAR point clouds. In particular, we wish to detect and categorize instances of interest, such as cars, pedestrians and cyclists. We formulate this problem as a point- wise classification problem, and propose an end-to-end pipeline called SqueezeSeg based on convolutional neural networks (CNN): the CNN takes a transformed LiDAR point cloud as input and directly outputs a point-wise label map, which is then refined by a conditional random field (CRF) implemented as a recurrent layer. Instance-level labels are then obtained by conventional clustering algorithms. Our CNN model is trained on LiDAR point clouds from the KITTI dataset, and our point-wise segmentation labels are derived from 3D bounding boxes from KITTI. To obtain extra training data, we built a LiDAR simulator into Grand Theft Auto V (GTA-V), a popular video game, to synthesize large amounts of realistic training data. Our experiments show that SqueezeSeg achieves high accuracy with astonishingly fast and stable runtime (8.7 ms per frame), highly desirable for autonomous driving applications. Furthermore, additionally training on synthesized data boosts validation accuracy on real-world data. Our source code and synthesized data will be open-sourced.

Authors (4)

Bichen Wu (52 papers)
Alvin Wan (16 papers)
Xiangyu Yue (93 papers)
Kurt Keutzer (200 papers)

Citations (770)

View on Semantic Scholar

Summary

Overview of SqueezeSeg: Convolutional Neural Nets with Recurrent CRF for Real-Time Road-Object Segmentation from 3D LiDAR Point Cloud

"SqueezeSeg: Convolutional Neural Nets with Recurrent CRF for Real-Time Road-Object Segmentation from 3D LiDAR Point Cloud" is a significant contribution to the domain of semantic segmentation for autonomous driving applications. The authors, Bichen Wu, Alvin Wan, Xiangyu Yue, and Kurt Keutzer, address the challenge of segmenting and classifying road objects such as cars, pedestrians, and cyclists in real-time using 3D LiDAR point clouds.

The crux of the research hinges on a novel end-to-end pipeline combining convolutional neural networks (CNN) and conditional random fields (CRF). The CNN is designed to handle transformed LiDAR point clouds and produces a point-wise label map, which is further refined using a CRF formulated as a recurrent layer. This method sidesteps the limitations of traditional segmentation methods that can suffer from compounded errors and instability due to reliance on hand-crafted features and iterative algorithms.

Methodology

Point Cloud Transformation

The primary challenge of using CNNs for LiDAR point cloud data is the inherently sparse and irregular distribution of the points. The authors propose transforming these 3D point clouds into dense 2D grids using spherical projection—a technique that compactly represents the data while preserving its structure. This transformation allows for efficient processing using existing 2D convolutional techniques without significant information loss.

Network Architecture

SqueezeSeg leverages the lightweight and efficient SqueezeNet architecture for feature extraction. The model processes an input tensor of size $64 \times 512 \times 5$ , focusing particularly on reducing memory requirements and computational complexity in order to achieve real-time inference. The architecture employs fireModules and fireDeconv modules to maintain efficiency, ensuring that the model remains lightweight while capable of detailed feature extraction.

CRF Implementation

To address the issue of label map blurriness often seen with CNN-generated outputs, the authors integrate a CRF layer. This layer refines the label map by leveraging the boundary and contextual information around each point. Recurrent mean-field approximation, implemented as an RNN layer, allows the CRF to be trained end-to-end with the CNN, enhancing the precision of point-wise label assignments.

Experimental Results

The model was trained using LiDAR data from the KITTI dataset, and to augment training data, a LiDAR simulator was constructed within the GTA-V gaming environment. This approach provided an extensive dataset that significantly improved the model's accuracy on real-world data.

Quantitative results demonstrate that SqueezeSeg achieves high accuracy (IoU) and real-time processing speed (8.7 ms per frame without CRF, 13.5 ms per frame with CRF). The model shows a particularly high recall for car detection (>90%), a critical feature for reducing the risk of missed detections in autonomous driving.

A notable experimentation point is the incorporation of synthetic data from GTA-V, which, when combined with real-world data, further enhanced model performance. This indicates the valuable role synthetic datasets can play in training models for real-world applications.

Implications and Future Work

The successful integration of deep learning techniques with 3D LiDAR data signifies a leap forward in the development of robust, reliable perception systems for autonomous vehicles. The efficiency and accuracy achieved by SqueezeSeg suggest a feasible pathway for deploying such systems in real-time environments.

Future developments could explore enhancing the model's performance for pedestrian and cyclist detection, optimizing CRF hyperparameters for these categories, and further refining synthetic data generation methods to better mimic real-world conditions. Additionally, exploring more efficient clustering algorithms to pair with this pipeline could further stabilize and speed up instance-level segmentation.

In conclusion, the methods and results presented in this paper offer substantial advancements in the field of real-time road-object segmentation, showcasing promising applications for enhanced autonomous driving technologies.

PDF Markdown

Related Papers

Find Related Papers