SalsaNext: Fast, Uncertainty-aware Semantic Segmentation of LiDAR Point Clouds for Autonomous Driving (2003.03653v4)

Published 7 Mar 2020 in cs.CV and cs.LG

Abstract: In this paper, we introduce SalsaNext for the uncertainty-aware semantic segmentation of a full 3D LiDAR point cloud in real-time. SalsaNext is the next version of SalsaNet [1] which has an encoder-decoder architecture where the encoder unit has a set of ResNet blocks and the decoder part combines upsampled features from the residual blocks. In contrast to SalsaNet, we introduce a new context module, replace the ResNet encoder blocks with a new residual dilated convolution stack with gradually increasing receptive fields and add the pixel-shuffle layer in the decoder. Additionally, we switch from stride convolution to average pooling and also apply central dropout treatment. To directly optimize the Jaccard index, we further combine the weighted cross-entropy loss with Lovasz-Softmax loss [2]. We finally inject a Bayesian treatment to compute the epistemic and aleatoric uncertainties for each point in the cloud. We provide a thorough quantitative evaluation on the Semantic-KITTI dataset [3], which demonstrates that the proposed SalsaNext outperforms other state-of-the-art semantic segmentation networks and ranks first on the Semantic-KITTI leaderboard. We also release our source code https://github.com/TiagoCortinhal/SalsaNext.

Citations (379)

View on Semantic Scholar

Summary

The paper presents SalsaNext, which integrates context-aware dilated convolutions, pixel-shuffle upsampling, and Bayesian uncertainty estimation for enhanced LiDAR segmentation.
It achieves a 59.5% mean IoU and over 14% improvement compared to its predecessor while processing point clouds at 24 Hz.
The model’s loss function optimizes class imbalance by combining weighted cross-entropy with the Lovász extension, ensuring robust performance in autonomous driving scenarios.

SalsaNext: Fast, Uncertainty-aware Semantic Segmentation of LiDAR Point Clouds for Autonomous Driving

The paper introduces SalsaNext, an advanced neural network aimed at real-time, uncertainty-aware semantic segmentation of 3D LiDAR point clouds specifically tailored for autonomous driving applications. Building upon its predecessor, SalsaNet, the authors propose several architectural enhancements and methodological innovations that significantly enhance both the performance and applicability of the network in real-world scenarios.

Key Methodological Enhancements

Architecture Improvements:
- Context Module: A new context module is introduced, utilizing a residual dilated convolution stack to capture global context from the full 360-degree LiDAR scans, ensuring broader receptive fields.
- Dilated Convolutions: In place of conventional ResNet blocks, the network employs a novel stack of dilated convolutions with varying kernel sizes (3, 5, 7) to improve spatial feature extraction.
- Pixel-Shuffle Layer: To efficiently handle upsampling without introducing artifacts, pixel-shuffle layers replace traditional transpose convolutions.
- Central Dropout: Dropout is applied to central layers, enhancing feature extraction while maintaining essential structural features intact.
- Average Pooling: Shift from stride convolution to average pooling in the encoder minimizes parameters while maintaining effectiveness.
Uncertainty Estimation:
- Epistemic and Aleatoric Uncertainty: The paper extends the SalsaNet framework by integrating Bayesian treatments. This allows for the calculation of epistemic (model) and aleatoric (data) uncertainty, essential for autonomous systems seeking to make reliable, safe decisions under uncertainty.
Loss Function Optimization:
- By combining weighted cross-entropy loss with the Lovász extension (optimizing mean IoU), the model addresses class imbalance, thus improving segmentation performance.

Quantitative Evaluation

The model's performance was rigorously evaluated using the #1 dataset, which is rich in annotated point clouds from autonomous driving contexts. Notably, SalsaNext secured the highest mean IoU of 59.5%, surpassing previous methods significantly, and demonstrating over 14% improvement compared to SalsaNet.

Computational Efficiency

SalsaNext's efficiency is highlighted by its ability to process point clouds at 24 Hz, aligning well with typical LiDAR refresh rates. This real-time capability is crucial for seamlessly integrating into autonomous vehicle systems. The model achieves this with a manageable computational load, evidenced by a parameter size of 6.73 million and performing well under constrained resources.

Implications and Future Work

SalsaNext presents a robust framework for real-time, uncertainty-aware semantic segmentation in autonomous driving. The estimation of uncertainty is particularly valuable, enabling the integration of semantic segmentation outputs into broader decision-making algorithms, enhancing safety by allowing the vehicle to acknowledge and act on ambiguous perceptions.

Future work could explore:

Enhancing Uncertainty Modeling: More sophisticated Bayesian techniques could refine uncertainty estimates further.
Testing Across Diverse Conditions: Extending evaluations to include adverse weather or lighting conditions to validate robustness.
Cross-Sensor Fusion: Integrating data from multiple sensors (e.g., RGB cameras) could enhance object detection and classification reliability.

In summary, SalsaNext represents a significant step forward in semantic segmentation for autonomous vehicles, offering practical utilities coupled with cutting-edge research advancements in uncertainty modeling and efficient neural network design.

Related Papers

GitHub

GitHub - TiagoCortinhal/SalsaNext: Uncertainty-aware Semantic Segmentation of LiDAR Point Clouds for Autonomous Driving (408 stars)

YouTube

Show All Videos