SuperFusion: Multilevel LiDAR-Camera Fusion for Long-Range HD Map Generation (2211.15656v4)

Published 28 Nov 2022 in cs.CV and cs.RO

Abstract: High-definition (HD) semantic map generation of the environment is an essential component of autonomous driving. Existing methods have achieved good performance in this task by fusing different sensor modalities, such as LiDAR and camera. However, current works are based on raw data or network feature-level fusion and only consider short-range HD map generation, limiting their deployment to realistic autonomous driving applications. In this paper, we focus on the task of building the HD maps in both short ranges, i.e., within 30 m, and also predicting long-range HD maps up to 90 m, which is required by downstream path planning and control tasks to improve the smoothness and safety of autonomous driving. To this end, we propose a novel network named SuperFusion, exploiting the fusion of LiDAR and camera data at multiple levels. We use LiDAR depth to improve image depth estimation and use image features to guide long-range LiDAR feature prediction. We benchmark our SuperFusion on the nuScenes dataset and a self-recorded dataset and show that it outperforms the state-of-the-art baseline methods with large margins on all intervals. Additionally, we apply the generated HD map to a downstream path planning task, demonstrating that the long-range HD maps predicted by our method can lead to better path planning for autonomous vehicles. Our code has been released at https://github.com/haomo-ai/SuperFusion.

Citations (27)

View on Semantic Scholar

Summary

The paper introduces SuperFusion, outlining a novel multilevel fusion strategy that combines LiDAR and camera data for improved long-range HD mapping.
The method significantly outperforms state-of-the-art techniques with robust mAP gains across 60-90m intervals using nuScenes and self-collected datasets.
The release of code and dataset fosters transparency and supports further research in sensor fusion for autonomous driving.

An Overview of "SuperFusion: Multilevel LiDAR-Camera Fusion for Long-Range HD Map Generation"

The paper "SuperFusion: Multilevel LiDAR-Camera Fusion for Long-Range HD Map Generation" proposes a novel method for generating high-definition (HD) semantic maps necessary for autonomous driving. The authors introduce a technique known as SuperFusion, which employs a multilevel fusion approach to combine LiDAR and camera data effectively, thereby enhancing both short-range and extending long-range map generation capabilities up to 90 meters. This extended range is crucial for improved path planning and control in autonomous vehicles, offering significant potential benefits for real-world driving applications.

Key Contributions

Multilevel Fusion Strategy: SuperFusion uniquely integrates LiDAR and camera data across multiple levels, which includes data-level, feature-level, and BEV-level fusion. This approach leverages the strengths of both data modalities, utilizing LiDAR depth information to refine image depth estimation and employing image features to enhance the prediction of LiDAR data at longer ranges.
Comprehensive Evaluation: The method is rigorously evaluated using the nuScenes and a self-collected dataset, demonstrating that SuperFusion surpasses existing state-of-the-art methods with substantial margins across all distance intervals. The HD maps generated using this approach have been shown to significantly improve downstream tasks, such as path planning for autonomous vehicles.
Code and Dataset Release: In support of transparency and future developments, the authors have committed to releasing both their code and a newly collected dataset, which will serve as valuable resources for further research in HD map generation technology.

Strong Numerical Results

SuperFusion exhibits strong numerical performance, particularly in extending accurate HD map generation to long-range distances. For instance, the method achieves a mean Average Precision (mAP) in long-range predictions (60-90 meters) that outperforms other contemporary approaches. The integration of cross-attention mechanisms and depth-aware camera-to-BEV transformations particularly contributed to its heightened accuracy, even in challenging environments.

Implications and Future Directions

The implications of this research are notable. Enhanced long-range map generation allows autonomous vehicles to perform more precise and timely path planning, reducing the likelihood of abrupt maneuvers that compromise safety and passenger comfort. The use of comprehensive sensor fusion stands to improve the reliability of autonomous systems in real-world scenarios with variable environmental conditions.

Given the promising results, future research could explore further optimizations in fusion techniques, possibly integrating additional sensor modalities such as radar or leveraging advancements in real-time processing capabilities to minimize latency. Moreover, as practical deployment continues, studies could focus on the robustness of such systems in diverse operational conditions, potentially extending their applicability to a wider range of autonomous systems.

In sum, the introduction of SuperFusion represents a significant advancement in the field of autonomous driving, particularly concerning long-range HD map generation. Through meticulous design and empirical validation, this approach sets a benchmark for future developments in multilevel sensor fusion technologies.

PDF Markdown

Related Papers

GitHub

GitHub - haomo-ai/SuperFusion: SuperFusion: Multilevel LiDAR-Camera Fusion for Long-Range HD Map Generation (286 stars)