V2X-Seq: A Large-Scale Sequential Dataset for Vehicle-Infrastructure Cooperative Perception and Forecasting (2305.05938v1)

Published 10 May 2023 in cs.CV and cs.AI

Abstract: Utilizing infrastructure and vehicle-side information to track and forecast the behaviors of surrounding traffic participants can significantly improve decision-making and safety in autonomous driving. However, the lack of real-world sequential datasets limits research in this area. To address this issue, we introduce V2X-Seq, the first large-scale sequential V2X dataset, which includes data frames, trajectories, vector maps, and traffic lights captured from natural scenery. V2X-Seq comprises two parts: the sequential perception dataset, which includes more than 15,000 frames captured from 95 scenarios, and the trajectory forecasting dataset, which contains about 80,000 infrastructure-view scenarios, 80,000 vehicle-view scenarios, and 50,000 cooperative-view scenarios captured from 28 intersections' areas, covering 672 hours of data. Based on V2X-Seq, we introduce three new tasks for vehicle-infrastructure cooperative (VIC) autonomous driving: VIC3D Tracking, Online-VIC Forecasting, and Offline-VIC Forecasting. We also provide benchmarks for the introduced tasks. Find data, code, and more up-to-date information at \href{https://github.com/AIR-THU/DAIR-V2X-Seq}{https://github.com/AIR-THU/DAIR-V2X-Seq}.

Citations (65)

View on Semantic Scholar

Summary

The paper introduces a comprehensive dataset that integrates sensor data from urban intersections and vehicles, enhancing cooperative perception.
It details a robust methodology involving multi-sensor deployment, 3D object detection, and trajectory mining across 672 hours of driving data.
The results demonstrate significant potential for advancing trajectory prediction, traffic management, and autonomous vehicle control systems.

Understanding the V2X-Seq Dataset and Benchmarks

This paper explores the technical depths of the V2X-Seq dataset, cataloging the detailed deployment of sensors and the subsequent collection and mining of trajectory data. It provides a comprehensive examination of the implementation and configuration details necessary to understand the dataset’s scope and capabilities.

Sensor Deployment and Infrastructure

A significant contribution of this paper is the in-depth description of sensor deployments across 28 urban traffic intersections in Beijing. The deployment strategy involved using 4 to 6 pairs of 300-beam LiDAR and high-resolution cameras at each intersection. This configuration ensures complete coverage of these intersections, facilitating detailed data capture. Furthermore, the paper details the deployment of one 40-beam LiDAR and six high-quality cameras on the self-driving vehicles that traverse these intersections. This dual deployment facilitates a robust mechanism for collecting diverse data types from both infrastructure-side and vehicle-side perspectives.

Trajectory Data Collection and Mining

The dataset construction involves an extensive data collection phase, spanning 672 hours of driving through sensor-equipped areas, during which data was recorded every three minutes. This extensive temporal dataset serves as a rich foundation for generating cooperative-view scenarios. The process leverages trained 3D object detection and tracking models to derive trajectory sequences represented as 3D boxes with comprehensive class attributes and trajectory IDs.

The paper outlines a rigorous trajectory mining process that includes scene fragmentation, selection, fusion, scoring, and filtering. Specifically, infrastructure-side and vehicle-side sequences are fragmented into overlapping 10-second segments, and pairs of segments from corresponding intersections are selected for further processing. Trajectory fusion then merges these viewpoints into cooperative trajectories. This meticulous selection and scoring process yield approximately 50,000 high-scoring cooperative-view trajectory sequences.

Visual Representations and Implications

The paper provides visual representations of the dataset's outputs, illustrating the comprehensive utility of cooperative-view data. The visualized data sets offer significant insights into traffic scenarios, suggesting potential pathways for enhancing trajectory prediction, traffic management, and autonomous vehicle control systems. The cooperative-view approach enhances the understanding of complex traffic dynamics by combining multiple perspectives, thus improving the accuracy of situational awareness.

Implications and Future Directions

The creation of the V2X-Seq dataset marks a meaningful advance in autonomous driving research. By providing a nuanced and multi-faceted dataset, this paper addresses several limitations that have long hindered trajectory prediction and traffic management research. The dual-perspective data drawn from infrastructure and vehicle sources affords a more robust framework for future enhancements in multi-agent trajectory prediction models and decision-making algorithms.

The implications extend beyond practical advancements in autonomous systems. They also offer a fertile ground for theoretical exploration in machine learning methodologies, especially those concerned with multi-view data fusion and cooperative perception. Future work may benefit from exploring more comprehensive sensor arrays or integrating additional data sources to further enrich the dataset’s utility and robustness. Additionally, the expansion of scenario diversity, strengthened by real-time processing and prediction, signifies a promising trajectory for future research evolving from this foundation.

PDF Markdown

Related Papers

GitHub

GitHub - AIR-THU/DAIR-V2X-Seq (124 stars)