AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset (2306.00612v3)

Published 1 Jun 2023 in cs.CV

Abstract: It is a long-term vision for Autonomous Driving (AD) community that the perception models can learn from a large-scale point cloud dataset, to obtain unified representations that can achieve promising results on different tasks or benchmarks. Previous works mainly focus on the self-supervised pre-training pipeline, meaning that they perform the pre-training and fine-tuning on the same benchmark, which is difficult to attain the performance scalability and cross-dataset application for the pre-training checkpoint. In this paper, for the first time, we are committed to building a large-scale pre-training point-cloud dataset with diverse data distribution, and meanwhile learning generalizable representations from such a diverse pre-training dataset. We formulate the point-cloud pre-training task as a semi-supervised problem, which leverages the few-shot labeled and massive unlabeled point-cloud data to generate the unified backbone representations that can be directly applied to many baseline models and benchmarks, decoupling the AD-related pre-training process and downstream fine-tuning task. During the period of backbone pre-training, by enhancing the scene- and instance-level distribution diversity and exploiting the backbone's ability to learn from unknown instances, we achieve significant performance gains on a series of downstream perception benchmarks including Waymo, nuScenes, and KITTI, under different baseline models like PV-RCNN++, SECOND, CenterPoint.

Citations (17)

View on Semantic Scholar

Summary

The paper proposes a novel pre-training paradigm that leverages a large-scale point cloud dataset to enhance autonomous driving perception.
It employs a class-aware semi-supervised pseudo-labeling strategy with unknown-aware instance learning and a consistency loss for unified representation learning.
Experimental results demonstrate accuracy improvements of 3.41%-8.45% on Waymo, nuScenes, and KITTI, underscoring enhanced cross-dataset generalizability.

An Overview of AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset

The paper "AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset" offers a novel approach to enhancing autonomous driving perception capabilities by leveraging a large-scale, semi-supervised pre-training paradigm. The research focuses on improving the generalization of perception models across various autonomous driving (AD) scenarios using a point cloud dataset.

Key Contributions and Methodology

This paper introduces a new pre-training paradigm, termed Autonomous Driving Pre-Training (AD-PT), designed to learn unified representations applicable across multiple autonomous driving tasks and benchmarks. In contrast to traditional self-supervised pre-training methods, AD-PT aims to maximize data diversity and improve cross-dataset generalizability by decoupling the pre-training and fine-tuning phases.

Dataset Preparation

Large-scale Pre-training Dataset: The research utilizes the ONCE dataset, honed through a class-aware and semi-supervised pseudo-labeling strategy. This approach employs the expertise of multiple baseline models, such as PV-RCNN++ and CenterPoint, to annotate different semantic classes and improve target detection precision.
Diversity Enhancement: The dataset is further augmented using point-to-beam re-sampling and object re-scaling techniques. These operations are designed to introduce diversity at both the scene-level (LiDAR beam variations) and instance-level (object size variations), thus fostering a more robust learning environment.

Unified Representation Learning

The authors propose a novel unknown-aware instance learning mechanism combined with a consistency loss function to tackle the challenge of varied taxonomies across pre-training and downstream datasets.

Unknown-aware Instance Learning: This module ensures that potential foreground instances, which may not have been labeled in the pre-training dataset, contribute to the feature learning process.
Consistency Loss: This loss function encourages consistency in feature representations derived from various augmented views, enhancing the robustness of the learned features.

Experimental Evaluation

The AD-PT paradigm demonstrates significant improvements on several prominent benchmarks, including Waymo, nuScenes, and KITTI, across different model architectures like PV-RCNN++, SECOND, and CenterPoint. Notably, the paper reports accuracy gains of $3.41\%$ , $8.45\%$ , and $4.25\%$ on Waymo, nuScenes, and KITTI datasets, respectively. These results underscore the paradigm’s effectiveness in enhancing model performance over existing self-supervised pre-training methods and traditional training approaches.

Implications and Future Directions

The introduction of AD-PT could redefine feature extraction methods for autonomous driving systems, providing a scalable, data-efficient pre-training approach. By highlighting the importance of dataset diversity and generalizable representation learning, this paper paves the way for developing more adaptable perception systems in AD.

Future research may focus on extending the AD-PT framework to incorporate a broader array of sensor inputs and urban driving environments. Moreover, exploring the integration of AD-PT with transformer-based architectures may yield further enhancements in perception accuracy and efficiency.

In summary, this research presents a well-founded approach to improving the cross-dataset generalizability of autonomous driving perception models through a novel pre-training paradigm, emphasizing both data diversity and refined representation learning strategies.

PDF Markdown

Related Papers

GitHub

GitHub - PJLab-ADG/3DTrans: An open-source codebase for exploring autonomous driving pre-training (555 stars)