Papers
Topics
Authors
Recent
Search
2000 character limit reached

VRUD: A Drone Dataset for Complex Vehicle-VRU Interactions within Mixed Traffic

Published 1 Apr 2026 in cs.RO, cs.DB, and eess.IV | (2604.01134v1)

Abstract: The Operational Design Domain (ODD) of urbanoriented Level 4 (L4) autonomous driving, especially for autonomous robotaxis, confronts formidable challenges in complex urban mixed traffic environments. These challenges stem mainly from the high density of Vulnerable Road Users (VRUs) and their highly uncertain and unpredictable interaction behaviors. However, existing open-source datasets predominantly focus on structured scenarios such as highways or regulated intersections, leaving a critical gap in data representing chaotic, unstructured urban environments. To address this, this paper proposes an efficient, high-precision method for constructing drone-based datasets and establishes the Vehicle-Vulnerable Road User Interaction Dataset (VRUD), as illustrated in Figure 1. Distinct from prior works, VRUD is collected from typical "Urban Villages" in Shenzhen, characterized by loose traffic supervision and extreme occlusion. The dataset comprises 4 hours of 4K/30Hz recording, containing 11,479 VRU trajectories and 1,939 vehicle trajectories. A key characteristic of VRUD is its composition: VRUs account for about 87% of all traffic participants, significantly exceeding the proportions in existing benchmarks. Furthermore, unlike datasets that only provide raw trajectories, we extracted 4,002 multi-agent interaction scenarios based on a novel Vector Time to Collision (VTTC) threshold, supported by standard OpenDRIVE HD maps. This study provides valuable, rare edge-case resources for enhancing the safety performance of ADS in complex, unstructured urban environments. To facilitate further research, we have made the VRUD dataset open-source at: https://zzi4.github.io/VRUD/.

Summary

  • The paper introduces VRUD, a high-resolution drone dataset that captures complex vehicle-VRU interactions in unstructured, high-density urban settings.
  • It employs advanced preprocessing techniques like multi-video alignment and trajectory extraction using YOLO11x-OBB and ByteTrack to ensure spatial-temporal precision.
  • Key findings reveal a critical 0.7s VTTC interaction window and an inverse relationship between vehicle speed and VRU density, offering actionable insights for ADS development.

VRUD: A Drone-Based Dataset for Vehicle-VRU Interactions in Mixed Urban Traffic

Introduction

The VRUD dataset presents a high-resolution, drone-based collection of vehicle and vulnerable road user (VRU) trajectories from complex, unstructured urban “village” environments in Shenzhen, China. This work directly confronts the conspicuous data scarcity for high-density, unpredictable traffic domains where VRUs overwhelmingly dominate road user composition and canonical vehicle-centric datasets exhibit limited utility. The VRUD corpus incorporates over 4 hours of 4K/30Hz aerial video, yielding 11,479 VRU and 1,939 vehicle trajectories, with VRUs constituting approximately 87% of all agents. The dataset is uniquely tailored for the development and evaluation of robust, interaction-aware autonomous driving systems (ADS) operating beyond structured scenarios.

Methodology and Data Properties

Data Acquisition and Preprocessing

Drone-based acquisition took place during morning and evening urban peak hours, specifically selecting environments devoid of fixed surveillance and marked by high occlusion and heterogeneity. Key settings included a flight altitude of 80 meters and image acquisition at 4K/30fps, ensuring high fidelity for small dynamic agents (e.g., pedestrians, cyclists, motorcycles). Privacy is intrinsically secured by operating altitude, obfuscating personally identifiable details without explicit redaction.

Preprocessing is performed in two primary stages:

  • Single-video stabilization leverages Harris corner detection and dense optical flow to correct within-session image-plane jitter, aligning all frames to a common coordinate basis.
  • Multi-video alignment addresses FOV discrepancies across battery-limited acquisition sessions, utilizing annotated reference points and transformation matrices for global spatial alignment. Figure 1

    Figure 1: Overlay of acquisition sessions highlights position and altitude discrepancies addressed by multi-video alignment.

Trajectory Extraction and Validation

Detection of seven traffic categories (pedestrians, bicycles, motorcycles, tricycles, cars, buses, trucks) is achieved using a YOLO11x-OBB model, trained on 3,000 manually labeled images. Perspective rectification is handled via an L-shape method, factoring aerial geometric distortion. Tracking is conducted using ByteTrack, with parameter tuning to optimize trajectory continuity for small and frequently occluded agents.

Empirical validation employs comparisons to RT ground-truth navigation systems, confirming high-precision spatial-temporal correspondence for critical vehicle dynamics.

Scenario Definition and Interaction Labeling

Each interaction scenario is ego-centric: for every motorized agent, all spatio-temporally proximal VRUs are identified per frame. The interaction intensity is formalized by introducing Vector Time to Collision (VTTC)—a surrogate safety measure extending classic TTC by inferring future conflict at the Closest Point of Approach, using full state vectors (relative position and velocity). Figure 2

Figure 2: Multi-agent interaction scenario indicating ego-vehicle (ID 3913) and prioritized critical targets for navigation.

Dataset Statistics and Comparative Analysis

User Composition and Motion Patterns

VRUD is distinguished by its extreme VRU prevalence (87%), surpassing all major benchmarks. Pedestrians and motorcycles are especially dominant, with motorcycles exhibiting the greatest average velocities—underscoring the unique traffic efficiency they achieve in these unregulated environments. Figure 3

Figure 3: Categorical and velocity statistics—pedestrians, motorcycles predominate; motorcycles possess highest mean speed.

Comparative analysis with datasets such as INTERACTION, SIND, and inD demonstrates that VRUD captures nearly half or more of its volume from VRUs, in stark contrast to vehicular-dominant benchmarking sets. Figure 4

Figure 4: Positive correlation between VRU count, scenario complexity, and mean VTTC, substantiating the impact of VRU density on interaction dynamics.

Behavioral Characterization via VTTC

Of ∼5,000 initial interaction segments, 4,002 scenarios meet strict interaction relevance (VTTC upper quartile threshold of 1.53s, with strong intensity convergence at 0.7s). This 0.7s temporal filter is empirically derived as a high-value demarcation capturing the critical “negotiation” window underpinning real-world vehicle-VRU interaction, rather than purely collision-avoidance. Figure 5

Figure 5: Ego-speed distribution reflects quasi-normal range, highlighting low-speed maneuvering typical in dense VRU urban traffic.

Figure 6

Figure 6: Interaction intensity analysis reveals strong coupling around a 0.7s VTTC threshold, optimizing high-relevance sample selection.

Significant findings include:

  • Vehicle velocity exhibits an inverse correlation with VRU count—ego speed declines as interaction density rises.
  • Behavioral data supports the emergence of a universal “negotiation corridor,” with vehicles self-regulating within tactical speed regimes.

Implications and Future Directions

VRUD’s architecture and data structure provide immediate integration points for rigorous ADS research, including but not limited to:

  • Trajectory prediction: High-variance, high-density VRU movement data enhances model robustness under “long-tail” edge cases.
  • Behavioral intent recognition: Rich annotations and spatio-temporal coupling support fine-grained intent inference and scenario simulation.
  • End-to-end planning/safety validation: VTTC-labeled scenarios facilitate risk-critical module evaluation, with the 0.7s filter supporting automated hard case mining.

Practically, VRUD bridges the gap left by prior structured datasets, enabling downstream research to address the true operational design domain limitations encountered in global urban deployments. Theoretical advances are also expected in surrogate safety measure calibration and adaptive planning policies for agent negotiation in highly uncertain state spaces.

Future development should include expansion to diverse geographies, integration with sensor fusion data (e.g., ground LIDAR or AV-mounted sensors for cross-modal validation), and the progressive automation of behavioral annotation leveraging advancements in relational deep learning.

Conclusion

VRUD is a comprehensive, open-source drone-derived dataset explicitly confronting the limitations of current vehicular trajectory corpora for unstructured urban scenarios. Its scale, resolution, and scenario extraction protocols position it as a pivotal resource for progressing socially compliant, risk-aware ADS research and robust model validation. The high-frequency, multi-agent labeled trajectories and empirically grounded interaction filters lay a robust foundation for the next phase of simulation-driven, safety-critical autonomous vehicle system development.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We found no open problems mentioned in this paper.

Collections

Sign up for free to add this paper to one or more collections.