UniDrive: Towards Universal Driving Perception Across Camera Configurations (2410.13864v2)

Published 17 Oct 2024 in cs.CV

Abstract: Vision-centric autonomous driving has demonstrated excellent performance with economical sensors. As the fundamental step, 3D perception aims to infer 3D information from 2D images based on 3D-2D projection. This makes driving perception models susceptible to sensor configuration (e.g., camera intrinsics and extrinsics) variations. However, generalizing across camera configurations is important for deploying autonomous driving models on different car models. In this paper, we present UniDrive, a novel framework for vision-centric autonomous driving to achieve universal perception across camera configurations. We deploy a set of unified virtual cameras and propose a ground-aware projection method to effectively transform the original images into these unified virtual views. We further propose a virtual configuration optimization method by minimizing the expected projection error between original and virtual cameras. The proposed virtual camera projection can be applied to existing 3D perception methods as a plug-and-play module to mitigate the challenges posed by camera parameter variability, resulting in more adaptable and reliable driving perception models. To evaluate the effectiveness of our framework, we collect a dataset on CARLA by driving the same routes while only modifying the camera configurations. Experimental results demonstrate that our method trained on one specific camera configuration can generalize to varying configurations with minor performance degradation.

Summary

The paper introduces UniDrive, a framework that transforms diverse camera inputs into a unified virtual space using a plug-and-play module to achieve consistent 3D driving perception.
Experimental results in the CARLA simulator demonstrate that UniDrive significantly enhances model robustness and maintains high detection performance across different camera setups compared to baselines.
UniDrive's adaptability reduces the need for retraining on new platforms, offering a promising step towards more resilient and general-purpose autonomous driving perception systems.

An Insight into "UniDrive: Towards Universal Driving Perception Across Camera Configurations"

The paper "UniDrive: Towards Universal Driving Perception Across Camera Configurations" introduced a novel framework aimed at addressing the challenges posed by variability in camera configurations in vision-centric autonomous driving. The UniDrive framework seeks to ensure consistent 3D perception across various camera setups, which is crucial for the scalability and adaptability of autonomous driving models to different vehicle platforms.

Overview of UniDrive Framework

UniDrive tackles the intrinsic and extrinsic variability of cameras by transforming the input images into a unified virtual camera space through a plug-and-play module. This transformation employs a ground-aware projection method to project images from the original camera setups into a unified virtual space, mitigating the detrimental effects of camera parameter variation. A central aspect of this framework is the virtual configuration optimization method, which seeks to minimize the expected projection error between the original and virtual cameras. This is achieved by a data-driven optimization strategy based on the Covariance Matrix Adaptation Evolution Strategy (CMA-ES).

Experimental Setup and Results

The empirical evaluation of the UniDrive framework was conducted using a dataset generated within the CARLA simulator, designed to mimic various real-life camera configurations. These configurations include camera numbers ranging from four to eight, as well as diverse placements and field-of-view settings. The benchmark encompassed several of these configurations, inspired by industry practices, to test the generalization performance of the UniDrive-enhanced models.

The experiments produced strong numerical results indicating that UniDrive significantly enhances the robustness of perception models against changes in camera configurations. Compared to baseline methods like BEVFusion-C, which suffer from a substantial loss of performance when applied to different camera setups, the UniDrive framework demonstrated notable stability. For instances with considerable variations in camera intrinsics or placement, UniDrive maintained high detection performance, with only a minor decrease in accuracy. This highlights the effectiveness of the virtual camera space in fostering model generalization.

Implications and Future Work

The implications of this research are substantial for the deployment of autonomous driving systems. By providing an adaptable solution for perception across various camera configurations, UniDrive reduces the need for exhaustive retraining and fine-tuning when transferring models across different vehicle platforms. This adaptability not only conserves computational resources but also facilitates more seamless integration of autonomous driving technologies in the automotive industry.

The findings open several avenues for future research and development. The scalability of the UniDrive framework could be further examined across even more diverse and dynamic environments, including real-world scenarios beyond simulations. Additionally, expanding the framework to integrate other sensor modalities, such as LiDAR, could yield comprehensive perception systems that are universally applicable.

In conclusion, the paper presents a significant contribution to overcoming one of the critical limitations in current autonomous driving technologies, offering a pathway toward more resilient and general-purpose perception systems. The UniDrive framework serves as a promising step toward realizing universal driving perception with the potential to enhance the efficacy and deployment of autonomous vehicles in diverse environments.