Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection (2103.05346v2)

Published 9 Mar 2021 in cs.CV and cs.LG

Abstract: We present a new domain adaptive self-training pipeline, named ST3D, for unsupervised domain adaptation on 3D object detection from point clouds. First, we pre-train the 3D detector on the source domain with our proposed random object scaling strategy for mitigating the negative effects of source domain bias. Then, the detector is iteratively improved on the target domain by alternatively conducting two steps, which are the pseudo label updating with the developed quality-aware triplet memory bank and the model training with curriculum data augmentation. These specific designs for 3D object detection enable the detector to be trained with consistent and high-quality pseudo labels and to avoid overfitting to the large number of easy examples in pseudo labeled data. Our ST3D achieves state-of-the-art performance on all evaluated datasets and even surpasses fully supervised results on KITTI 3D object detection benchmark. Code will be available at https://github.com/CVMI-Lab/ST3D.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Jihan Yang (19 papers)
  2. Shaoshuai Shi (39 papers)
  3. Zhe Wang (574 papers)
  4. Hongsheng Li (340 papers)
  5. Xiaojuan Qi (133 papers)
Citations (173)

Summary

An Overview of ST3D: Self-Training for Unsupervised Domain Adaptation on 3D Object Detection

This paper introduces ST3D, a novel self-training framework aimed at addressing unsupervised domain adaptation (UDA) challenges in 3D object detection from LiDAR point clouds. The framework is specifically designed to counter the limitations of domain shifts, which occur when 3D object detectors, trained on one domain (source), are applied to another (target) without any labeled data in the target domain. ST3D is comprised of a sequence of strategies that optimize the generalizability of 3D detectors across varying environments and sensor configurations.

Key Technical Innovations

  1. Random Object Scaling (ROS): During the pre-training phase on the source domain, this technique augments the object sizes randomly, thereby altering the dimensions of 3D bounding boxes. This method addresses the issue of object size bias, contributing to the robust performance of 3D detectors when applied under different domain settings. The ROS strategy counteracts the potential overfitting to source domain object size distributions.
  2. Quality-Aware Triplet Memory Bank (QTMB): This component is vital for generating high-quality pseudo labels for the target domain. It employs an IoU-based scoring criterion to judge the accuracy of pseudo labels, enhancing the precision and reliability of object localization predictions. The triplet box partition scheme mitigates ambiguous labeling by separating samples into positive, ignored, and negative, based on their IoU quality score.
  3. Curriculum Data Augmentation (CDA): To overcome model overfitting during self-training, ST3D progressively ramps up the intensity of data augmentations, simulating more challenging scenarios with curriculum learning principles. This ensures the model evolves from easy examples to more complex ones, aligning with the target domain's intricacies.

Numerical Results and Performance

ST3D achieves state-of-the-art results across multiple datasets, notably closing the performance gap between source-only models and fully supervised Oracle models. For instance, in the case of Waymo to KITTI adaptations, ST3D surpassed other methods by over 74% improvement in AP\textsubscript{3D}, demonstrating its robust effectiveness without requiring target domain statistical information. Moreover, the framework occasionally even outperformed fully supervised Oracle results, indicating its potential in practical deployment scenarios without labeled data.

Implications and Future Directions

With ST3D, the unsupervised domain adaptation of 3D detectors becomes significantly more feasible, especially in autonomous driving applications where datasets differ vastly in geographical and environmental settings. The approach advances possibilities for deploying reliable 3D detection systems across diverse sensor setups and landscapes without incurring additional data labeling costs.

Looking forward, further research might explore extensions of ST3D to adapt other aspects of 3D detection, such as dealing with dynamic domain variations involving rapid environmental changes. Additionally, integrating ST3D with emerging self-supervised learning techniques could refine and enhance domain adaptation efficiency even further, breaking new ground in AI-driven perception systems.