DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection (2204.05575v1)

Published 12 Apr 2022 in cs.CV and cs.AI

Abstract: Autonomous driving faces great safety challenges for a lack of global perspective and the limitation of long-range perception capabilities. It has been widely agreed that vehicle-infrastructure cooperation is required to achieve Level 5 autonomy. However, there is still NO dataset from real scenarios available for computer vision researchers to work on vehicle-infrastructure cooperation-related problems. To accelerate computer vision research and innovation for Vehicle-Infrastructure Cooperative Autonomous Driving (VICAD), we release DAIR-V2X Dataset, which is the first large-scale, multi-modality, multi-view dataset from real scenarios for VICAD. DAIR-V2X comprises 71254 LiDAR frames and 71254 Camera frames, and all frames are captured from real scenes with 3D annotations. The Vehicle-Infrastructure Cooperative 3D Object Detection problem (VIC3D) is introduced, formulating the problem of collaboratively locating and identifying 3D objects using sensory inputs from both vehicle and infrastructure. In addition to solving traditional 3D object detection problems, the solution of VIC3D needs to consider the temporal asynchrony problem between vehicle and infrastructure sensors and the data transmission cost between them. Furthermore, we propose Time Compensation Late Fusion (TCLF), a late fusion framework for the VIC3D task as a benchmark based on DAIR-V2X. Find data, code, and more up-to-date information at https://thudair.baai.ac.cn/index and https://github.com/AIR-THU/DAIR-V2X.

Citations (256)

View on Semantic Scholar

Summary

The paper introduces a comprehensive dataset with over 71,000 LiDAR and camera frames designed for vehicle-infrastructure cooperative 3D object detection.
It defines the VIC3D task, enabling joint object localization and identification using asynchronous multi-modal sensor data.
The proposed TCLF fusion method compensates for temporal discrepancies, yielding a 10-20 point improvement in average precision.

Overview of DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection

The paper presents DAIR-V2X, a large-scale dataset developed to advance Vehicle-Infrastructure Cooperative Autonomous Driving (VICAD). This dataset is notable for its multi-modality, multi-view characteristics, capturing data from real-world scenarios to address the challenges faced in achieving Level 5 vehicle autonomy through cooperative perception.

Key Contributions

Dataset Specifications: DAIR-V2X comprises 71,254 frames each for LiDAR and Camera, annotated with 3D data. The dataset is designed for vehicle-infrastructure cooperative 3D object detection (VIC3D), facilitating the paper of sensory integration between vehicles and infrastructure.
VIC3D Task Definition: The VIC3D task is introduced to address the problem of jointly locating and identifying objects through cooperative sensing. This task requires managing the temporal asynchrony and data transmission challenges between distributed sensors.
Time Compensation Late Fusion (TCLF): To benchmark VIC3D, the paper proposes TCLF, an innovative late fusion method, which integrates asynchronous data effectively. This framework elevates detection accuracy by compensating for time discrepancies in sensor readings.

Numerical Results

The publication reports significant performance improvements in detection accuracy with cooperative sensory models. Notably, fusion models demonstrated enhancements in average precision (AP) by 10 to 20 points over single-view detection systems. This underscores the efficacy of infrastructure-supported perception in mitigating limitations of vehicle-only sensors.

Implications and Future Directions

The introduction of DAIR-V2X is pivotal for VICAD research, setting a foundation for developing algorithms that can leverage multi-sensory input in real-world driving environments. The dataset’s real-world capture ensures that efforts align closely with practical challenges, bridging gaps seen in synthetic simulations.

The TCLF framework highlights a promising direction for managing temporal asynchrony. As future work, researchers may explore the trade-offs between fusion accuracy and bandwidth usage, crucial for real-time applications. Further development in feature-level fusion methods could yield even greater AP improvements with reduced data transmission requirements.

Conclusion

DAIR-V2X establishes a comprehensive basis for exploring vehicle-infrastructure cooperation in 3D object detection. By providing a robust dataset and formulating the VIC3D detection task, the authors contribute significantly to the landscape of cooperative autonomous driving systems. This endeavor is expected to catalyze continued advancements in artificial intelligence applications for vehicular autonomy, focusing on enhancing the perception capabilities through integrated and distributed sensory systems.

Related Papers

GitHub

GitHub - AIR-THU/DAIR-V2X (394 stars)