Papers
Topics
Authors
Recent
2000 character limit reached

SODA10M: A Large-Scale 2D Self/Semi-Supervised Object Detection Dataset for Autonomous Driving

Published 21 Jun 2021 in cs.CV | (2106.11118v3)

Abstract: Aiming at facilitating a real-world, ever-evolving and scalable autonomous driving system, we present a large-scale dataset for standardizing the evaluation of different self-supervised and semi-supervised approaches by learning from raw data, which is the first and largest dataset to date. Existing autonomous driving systems heavily rely on `perfect' visual perception models (i.e., detection) trained using extensive annotated data to ensure safety. However, it is unrealistic to elaborately label instances of all scenarios and circumstances (i.e., night, extreme weather, cities) when deploying a robust autonomous driving system. Motivated by recent advances of self-supervised and semi-supervised learning, a promising direction is to learn a robust detection model by collaboratively exploiting large-scale unlabeled data and few labeled data. Existing datasets either provide only a small amount of data or covers limited domains with full annotation, hindering the exploration of large-scale pre-trained models. Here, we release a Large-Scale 2D Self/semi-supervised Object Detection dataset for Autonomous driving, named as SODA10M, containing 10 million unlabeled images and 20K images labeled with 6 representative object categories. To improve diversity, the images are collected within 27833 driving hours under different weather conditions, periods and location scenes of 32 different cities. We provide extensive experiments and deep analyses of existing popular self/semi-supervised approaches, and give some interesting findings in autonomous driving scope. Experiments show that SODA10M can serve as a promising pre-training dataset for different self-supervised learning methods, which gives superior performance when fine-tuning with different downstream tasks (i.e., detection, semantic/instance segmentation) in autonomous driving domain. More information can refer to https://soda-2d.github.io.

Citations (57)

Summary

  • The paper introduces SODA10M, a massive dataset with 10M unlabeled and 20K labeled images that enhances object detection in autonomous driving.
  • It leverages diverse real-world conditions from 32 cities and varying weather to improve model robustness through self and semi-supervised learning.
  • Benchmark results show that pre-training with SODA10M yields significant performance gains over conventional datasets like ImageNet.

SODA10M: A Large-Scale 2D Self/Semi-Supervised Object Detection Dataset for Autonomous Driving

Introduction

The paper "SODA10M: A Large-Scale 2D Self/Semi-Supervised Object Detection Dataset for Autonomous Driving" (2106.11118) focuses on addressing the critical challenge in autonomous driving systems: robust visual perception in diverse and complex environments. The paper introduces SODA10M, a large-scale dataset aimed at supporting the development and evaluation of self-supervised and semi-supervised learning approaches for object detection in autonomous driving.

Dataset Composition and Collection

SODA10M comprises 10 million unlabeled images and 20 thousand labeled images with six object categories. The images cover varying weather conditions, periods, and geographical locations across 32 cities. The dataset is significantly larger than existing datasets, such as Waymo and BDD100K, offering increased diversity and coverage, which are essential for training robust perception models. Figure 1

Figure 1: Examples of challenging environments in our SODA10M dataset, including 10 million images covering different weather conditions, periods, and locations.

The images were collected via a crowdsourcing approach involving taxi drivers using high-resolution cameras. This method ensured diversity in data collection, encompassing different driving scenarios. The labeled subset provides high-quality 2D bounding boxes, ensuring precision for supervised evaluations.

Key Features and Benchmarking

Scale and Diversity

As the largest dataset of its kind, SODA10M significantly surpasses other datasets in both scale and diversity. It is designed to facilitate research into self and semi-supervised methods by providing a vast pool of varied driving scenes. This scale enhances the potential for generalization to real-world applications. Figure 2

Figure 2

Figure 2

Figure 2

Figure 2: Statistics of the unlabeled set showing geographical distribution, number of images at different locations, weather conditions, and periods.

Evaluation of Learning Approaches

The dataset supports the evaluation of various self and semi-supervised learning techniques. Initial benchmarks demonstrate that SODA10M provides superior pre-training outcomes, enhancing model performance when fine-tuned for downstream tasks like detection and segmentation. Notably, methods such as MoCo and DenseCL showed promise in leveraging SODA10M for significant gains over traditional pre-training datasets like ImageNet. Figure 3

Figure 3

Figure 3

Figure 3: Overview of different methods used for building the SODA10M benchmark.

Implications for Autonomous Driving

SODA10M enables the development of more robust autonomous driving systems by fostering research in methodologies that can learn from large-scale unlabeled data. The dataset's diversity ensures that models trained on it can generalize well across different driving conditions, which is crucial for the safety and reliability of autonomous vehicles. Furthermore, the research conducted using SODA10M could lead to more efficient annotation processes by reducing reliance on large amounts of labeled data.

Conclusion

SODA10M represents a significant contribution to autonomous driving research, offering a comprehensive testbed for exploring advanced object detection methodologies. By providing the largest and most diverse dataset available, it opens pathways for developing next-generation perception models, emphasizing the role of self and semi-supervised learning in practical autonomous driving applications. Future work will likely explore new contrastive loss functions and domain adaptation techniques to further improve model performance under varied conditions.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

GitHub