Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SalsaNet: Fast Road and Vehicle Segmentation in LiDAR Point Clouds for Autonomous Driving (1909.08291v1)

Published 18 Sep 2019 in cs.CV and cs.RO

Abstract: In this paper, we introduce a deep encoder-decoder network, named SalsaNet, for efficient semantic segmentation of 3D LiDAR point clouds. SalsaNet segments the road, i.e. drivable free-space, and vehicles in the scene by employing the Bird-Eye-View (BEV) image projection of the point cloud. To overcome the lack of annotated point cloud data, in particular for the road segments, we introduce an auto-labeling process which transfers automatically generated labels from the camera to LiDAR. We also explore the role of imagelike projection of LiDAR data in semantic segmentation by comparing BEV with spherical-front-view projection and show that SalsaNet is projection-agnostic. We perform quantitative and qualitative evaluations on the KITTI dataset, which demonstrate that the proposed SalsaNet outperforms other state-of-the-art semantic segmentation networks in terms of accuracy and computation time. Our code and data are publicly available at https://gitlab.com/aksoyeren/salsanet.git.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Eren Erdal Aksoy (21 papers)
  2. Saimir Baci (2 papers)
  3. Selcuk Cavdar (2 papers)
Citations (129)

Summary

  • The paper presents SalsaNet, a deep encoder-decoder that achieves real-time LiDAR segmentation for rapid road and vehicle detection.
  • It employs a bird’s-eye-view projection and an innovative auto-labeling process to structure sparse LiDAR data and expand training samples.
  • Empirical results on KITTI demonstrate that SalsaNet outperforms existing models with 160 FPS inference and robust accuracy in small object segmentation.

SalsaNet: Efficient Semantic Segmentation for Autonomous Driving

The paper introduces SalsaNet, a deep encoder-decoder neural network designed for rapid and effective semantic segmentation of LiDAR point clouds, particularly in the autonomous driving context. By addressing segmentation both of drivable road spaces and vehicles, SalsaNet plays a pivotal role in enhancing scene understanding for self-driving vehicles, thereby contributing to improved safety and driving decisions.

Methodology Overview

SalsaNet employs a bird-eye-view (BEV) image projection of point clouds, accommodating the unique challenges posed by the sparse nature of LiDAR data. This top-view approach aids in generating structured data conducive for convolutional processing. To mitigate data annotation challenges, SalsaNet incorporates an innovative auto-labeling process, which transfers labels from image-based camera data to LiDAR point clouds, leveraging state-of-the-art camera segmentation methodologies. Notably, this auto-labeling boosts the training dataset significantly, enabling robust model development.

The network explores and contrasts the effectiveness of BEV versus spherical-front-view (SFV) projections. This comparison is vital as it underscores SalsaNet's projection-agnostic nature, demonstrating its capability to perform consistently across various projection methodologies.

Structurally, SalsaNet comprises an encoder built on ResNet blocks and a decoder that upsamples features and integrates them with features from early layers using skip connections. This architecture, complemented by a class-balanced loss function, ensures effective feature extraction and classification even for less frequently occurring classes in the dataset.

Empirical Evaluation

The empirical analysis conducted using the KITTI dataset illustrates SalsaNet’s competitive edge over existing state-of-the-art segmentation networks, including SqueezeSegV1, SqueezeSegV2, and U-Net. The proposed network not only improves pixel-wise segmentation accuracy but also achieves notable computational efficiency. Specifically, SalsaNet achieves an inference speed of up to 160 frames per second, demonstrating its suitability for real-time applications in autonomous driving.

The quantitative results indicate SalsaNet's superior handling of small object segmentation, such as vehicles, particularly when leveraging the BEV projection. This performance consistency, irrespective of projection method, corroborates the network's robustness in diverse operational conditions.

Implications and Future Prospects

SalsaNet holds significant promise in augmenting the perception capabilities of autonomous vehicles. Its real-time efficiency combined with high accuracy in segmenting crucial driving components suggests practical utility in deploying on-board vehicle perception systems. The release of SalsaNet’s annotated dataset and code further fosters research into more nuanced scenarios in autonomous driving.

Theoretically, SalsaNet underscores the potential of auto-labeling processes to alleviate data scarcity issues, hinting at broader applicability in other domains requiring semantic segmentation. Future work could prospectively explore integrating SalsaNet with additional perception tasks like object detection and tracking, potentially paving the way for an integrated autonomous driving perception suite.

In conclusion, SalsaNet represents a substantial advancement in the semantic segmentation of LiDAR data, offering a scalable and efficient solution to the complex demands of autonomous vehicle navigation and safety systems.

Youtube Logo Streamline Icon: https://streamlinehq.com