AGC-Drive: A Large-Scale Dataset for Real-World Aerial-Ground Collaboration in Driving Scenarios (2506.16371v1)

Published 19 Jun 2025 in cs.CV

Abstract: By sharing information across multiple agents, collaborative perception helps autonomous vehicles mitigate occlusions and improve overall perception accuracy. While most previous work focus on vehicle-to-vehicle and vehicle-to-infrastructure collaboration, with limited attention to aerial perspectives provided by UAVs, which uniquely offer dynamic, top-down views to alleviate occlusions and monitor large-scale interactive environments. A major reason for this is the lack of high-quality datasets for aerial-ground collaborative scenarios. To bridge this gap, we present AGC-Drive, the first large-scale real-world dataset for Aerial-Ground Cooperative 3D perception. The data collection platform consists of two vehicles, each equipped with five cameras and one LiDAR sensor, and one UAV carrying a forward-facing camera and a LiDAR sensor, enabling comprehensive multi-view and multi-agent perception. Consisting of approximately 120K LiDAR frames and 440K images, the dataset covers 14 diverse real-world driving scenarios, including urban roundabouts, highway tunnels, and on/off ramps. Notably, 19.5% of the data comprises dynamic interaction events, including vehicle cut-ins, cut-outs, and frequent lane changes. AGC-Drive contains 400 scenes, each with approximately 100 frames and fully annotated 3D bounding boxes covering 13 object categories. We provide benchmarks for two 3D perception tasks: vehicle-to-vehicle collaborative perception and vehicle-to-UAV collaborative perception. Additionally, we release an open-source toolkit, including spatiotemporal alignment verification tools, multi-agent visualization systems, and collaborative annotation utilities. The dataset and code are available at https://github.com/PercepX/AGC-Drive.

Summary

The paper introduces a comprehensive aerial-ground dataset for collaborative 3D perception, featuring 120K LiDAR frames and 1.6M annotated boxes across 400 scenes.
It defines dual benchmarks—AGC-V2V for ground-only fusion and AGC-VUC for multi-agent collaboration—demonstrating performance gains with UAV integration.
The implementation employs precise sensor calibration and synchronization, addressing real-world challenges such as occlusion and communication delays in complex driving conditions.

Overview of AGC-Drive: Enabling Real-World Aerial-Ground Collaboration for 3D Perception in Driving

AGC-Drive introduces a comprehensive real-world dataset for collaborative 3D perception in complex driving scenarios, focusing specifically on the integration of aerial (UAV) and ground vehicle sensing. By explicitly addressing the scarcity of datasets featuring multi-agent, multimodal, and aerial-ground collaboration with high-fidelity annotations, AGC-Drive establishes a new foundation for empirical research on collaborative autonomous perception systems.

Dataset Design and Data Collection

AGC-Drive is organized around a multi-agent sensor platform comprising two vehicles and one UAV, each equipped with high-resolution LiDAR and camera systems. The ground vehicles utilize five multi-focal cameras and 128-beam LiDARs, while the UAV operates a 32-beam LiDAR and a downward-facing camera. The vehicle and UAV platforms are time-synchronized via GPS/IMU integration, with careful spatial alignment achieved through multi-modal calibration and post-hoc point cloud registration, facilitating accurate, frame-aligned global perception.

Key dataset properties include:

Scale and Scope: ~120K LiDAR frames and 440K images across 400 scenes, each with ~100 frames and full 3D box annotations (1.6M boxes total; 13 object classes).
Scene Diversity: Coverage of 14 scenario types, spanning urban, highway, and rural environments, including challenging cases such as roundabouts, tunnels, ramps, and events like vehicle cut-ins/outs.
Dynamic Content: Nearly 20% of the data features high-interaction dynamics, directly supporting research on perception under complex traffic maneuvers.
Occlusion Labelling: Each 3D box is annotated with visibility/occlusion levels, providing granular supervision for occlusion-aware perception modeling.
Open-Source Tooling: Accompanying toolkits for calibration verification, multi-agent visualization, and collaborative annotation are publicly released.

Benchmark Structure and Tasks

The paper defines two primary benchmarks:

AGC-V2V: Multi-vehicle cooperative perception without UAV involvement, serving as a real-world baseline for ground-level cooperative detection.
AGC-VUC: Multi-agent collaborative perception with both vehicles and UAV, enabling evaluation of the UAV’s top-down perspective as a complement to ground sensing.

Both benchmarks adopt the OPV2V schema for data and annotation formatting, and evaluation is performed on [email protected] and [email protected], along with a metric Δ_UAV that quantifies the improvement attributable to UAV participation.

Experimental Evaluation and Results

Six representative 3D perception frameworks are benchmarked, all using PointPillars as the detection backbone:

Lower-bound (late fusion, detection sharing)
Upper-bound (early fusion, raw point cloud sharing)
V2VNet (intermediate feature fusion)
CoBEVT (sparse transformer-based BEV segmentation)
Where2comm (communication-efficient confidence map sharing)
V2X-ViT (transformer-based fused BEV features)

AGC-V2V performance demonstrates a large gap between early fusion (Upper-bound: 58.2% [email protected], 43.1% [email protected]) and both intermediate and late fusion methods (best intermediate: Where2comm, 34.8% and 22.5% mAP), reflecting the real-world impact of imperfect pose estimation and communication delays on collaborative fusion.

AGC-VUC results show that incorporating the UAV consistently improves performance across all fusion strategies (with Δ_UAV up to +3.3% [email protected] for Upper-bound early fusion). Notably, the improvement is observed even for communication-efficient and transformer-based methods, but the Lower-bound regression (-1.0%) highlights sensitivity to error propagation in late fusion. The qualitative analysis confirms that the aerial perspective is especially beneficial for occluded and distant objects, supporting the hypothesis that UAV data meaningfully supplements ground-based perception in complex scenes.

Implementation Considerations

AGC-Drive provides the community with both the raw data and an integrated toolkit:

Calibration and Synchronization: Deployment of GPS/IMU-based initial alignments, refined via ICP for multi-agent LiDAR registration, followed by extrinsic camera-LiDAR calibration via PnP.
Privacy Protection: All sensitive information, including GPS traces and human faces, is sanitized or blurred to support open data sharing.
Computational Requirements: Each baseline model was trained on 8 Nvidia L40 GPUs for 40 epochs, with practical feasibility for both academic and industry-scale experiments (6 hours per run).

Limitations

A principal limitation lies in the sparseness of airborne LiDAR data, which—despite aiding in scene-level awareness—provides limited fine-grained object details, especially at ground level. The vertical field of view and blind zones beneath the UAV are technical bottlenecks. Future iterations are suggested to upgrade UAV sensor payloads to address this shortcoming.

Additionally, AGC-Drive deliberately retains realistic timing and registration errors to reflect operational challenges in real-world collaborative systems. This design choice renders benchmarking results as conservative estimates relative to idealized simulation-based datasets—highlighting areas where future algorithms must develop robustness to synchronization and alignment imperfections.

Implications and Future Directions

AGC-Drive constitutes a significant empirical advance towards real-world evaluation of aerial-ground collaborative perception, opening up lines of inquiry in:

Robustness to asynchronous and misaligned multi-agent data fusion,
Occlusion handling and long-range detection in complex, high-interaction scenarios,
Communication-efficient feature and object sharing strategies,
Multi-modal annotation for increasingly sophisticated perception frameworks beyond object detection (e.g., tracking, prediction, joint intention estimation).

Future development may extend towards higher-density UAV sensing, broader environmental conditions (night, adverse weather), larger fleets (multi-UAV multi-vehicle), and expanded annotation for additional perception and action tasks.

The dataset is likely to become a touchstone for research in collaborative autonomous driving perception, providing a public, high-quality, and practically constraining testbed for new algorithms capable of handling the nuanced requirements of real-world multi-agent perception and planning.

PDF Markdown

Related Papers

GitHub

GitHub - PercepX/AGC-Drive (3 stars)