CENet: Toward Concise and Efficient LiDAR Semantic Segmentation for Autonomous Driving (2207.12691v1)

Published 26 Jul 2022 in cs.CV

Abstract: Accurate and fast scene understanding is one of the challenging task for autonomous driving, which requires to take full advantage of LiDAR point clouds for semantic segmentation. In this paper, we present a \textbf{concise} and \textbf{efficient} image-based semantic segmentation network, named \textbf{CENet}. In order to improve the descriptive power of learned features and reduce the computational as well as time complexity, our CENet integrates the convolution with larger kernel size instead of MLP, carefully-selected activation functions, and multiple auxiliary segmentation heads with corresponding loss functions into architecture. Quantitative and qualitative experiments conducted on publicly available benchmarks, SemanticKITTI and SemanticPOSS, demonstrate that our pipeline achieves much better mIoU and inference performance compared with state-of-the-art models. The code will be available at https://github.com/huixiancheng/CENet.

Citations (49)

View on Semantic Scholar

Summary

The paper presents CENet, a novel range image-based LiDAR segmentation network that enhances efficiency and accuracy using larger convolutional kernels and advanced activations.
The network achieves impressive results with a 64.7% mIoU and 37.8 FPS on benchmark datasets by integrating auxiliary segmentation heads and a composite loss function.
The innovative methodology reduces the complexity of 3D point cloud segmentation by converting it into a 2D image problem, crucial for real-time autonomous driving perception.

Overview of "CENet: Toward Concise and Efficient LiDAR Semantic Segmentation for Autonomous Driving"

The paper "CENet: Toward Concise and Efficient LiDAR Semantic Segmentation for Autonomous Driving" introduces CENet, an innovative semantic segmentation network designed to address the complexities associated with leveraging LiDAR point clouds for autonomous driving tasks. The rising necessity for precise and rapid scene understanding in autonomous driving intensifies the importance of efficient processing of LiDAR data, which captures rich geometric information essential for environment perception and understanding.

Key Contributions and Methodology

The authors present a novel network architecture, CENet, which is optimized for efficient computation without compromising on performance. The architecture adopts a range image-based approach leveraging the projection of 3D LiDAR point clouds into 2D images using a spherical projection method. This step transforms the semantic segmentation problem into a manageable image segmentation task, thus enabling the application of well-established 2D convolutional neural networks (CNNs).

Significant features of CENet include:

Architecture Design: CENet replaces typical Multi-layer Perceptrons (MLPs) with convolutional layers of larger kernel size, specifically favoring $3 \times 3$ over $1 \times 1$ kernels. This decision exploits the structural properties of range images and takes advantage of modern computational library optimizations, enhancing both performance and computational efficiency.
Activation Functions: The network employs advanced activation functions such as SiLU and Hardswish, which exhibit improved non-linear capabilities pivotal for enhancing the descriptive power of the network.
Auxiliary Segmentation Heads: Multiple auxiliary segmentation heads are incorporated to improve the network's learning capacity. These heads enhance feature refinement and are removed during inference, thereby not affecting inference time or computational cost.
Loss Function: CENet utilizes a composite loss function that includes weighted cross-entropy, Lovász-Softmax for directly optimizing IoU, and a boundary loss to address the blurring of segmentation boundaries.

Experimental Results

Quantitative evaluation on renowned datasets such as SemanticKITTI and SemanticPOSS demonstrates that CENet achieves superior performance in terms of mean Intersection over Union (mIoU) compared to existing state-of-the-art methods. For instance, with an input size of $64 \times 2048$ , CENet attains a mIoU of 64.7% with a substantial real-time inference capability at 37.8 FPS, outperforming various point-based, voxel-based, and image-based competitors.

Moreover, the inclusion of auxiliary loss modules and improved activation functions significantly enhances model performance across different input resolutions, attesting to the modularity and flexibility of the proposed enhancements.

Implications and Future Directions

CENet presents valuable insights into optimizing real-time LiDAR-based semantic segmentation, highlighting the potential for reducing the computational burden while achieving competitive results. The architecture’s adaptability in incorporating different backbone networks without substantial trade-offs further underlines its robustness.

The practical implications for autonomous driving are profound, where real-time, accurate perception is crucial. Future research could explore the scalability of CENet on larger datasets and under diverse environmental conditions. Additionally, the potential integration with other sensor modalities could extend its applicability and enhance its robustness in various operational scenarios.

In conclusion, CENet represents a substantive advancement in LiDAR semantic segmentation, demonstrating significant improvements in inference speed and segmentation accuracy, key metrics in the domain of autonomous driving.

PDF Markdown

Related Papers

GitHub

GitHub - huixiancheng/CENet: [ICME 2022] CENet: Toward Concise and Efficient LiDAR Semantic Segmentation for Autonomous Driving (100 stars)

YouTube

Show All Videos