DSC3D: 3D Dataset for Autonomous Driving
- DeepScenario Open 3D Dataset (DSC3D) is an occlusion-free, high-fidelity dataset providing 6-DoF trajectories for diverse traffic participants in various environments.
- It utilizes a monocular drone-based capture and advanced 3D reconstruction techniques (SfM and MVS) to generate precise, geo-referenced data and HD maps.
- The dataset supports autonomous driving research by enhancing motion prediction, safety validation, and realistic multi-agent behavior modeling.
The DeepScenario Open 3D Dataset (DSC3D) is an occlusion-free, high-fidelity dataset of 6-DoF bounding box trajectories for diverse traffic participants in urban and non-urban environments. Utilizing a monocular drone-based capture and an advanced end-to-end reconstruction pipeline, DSC3D consists of over 175,000 unique, precisely geo-referenced trajectories across five heterogeneous locations in Europe and the United States. The dataset is designed to advance research in autonomous driving by providing large-scale, detailed 3D motion and interaction data, with comprehensive annotations and a robust online visualization and access interface (Dhaouadi et al., 24 Apr 2025).
1. Monocular Drone-Based Data Collection and 3D Scene Reconstruction
The DSC3D collection pipeline employs commercial DJI quadcopters fitted with downward-tilted (0–30) stabilized RGB cameras streaming at 25 Hz. The data capture follows a two-pass flight protocol:
(A) Mapping Flight: The drone traverses the scene, capturing geo-tagged still images , where GPS is recorded in WGS84 coordinates.
(B) Static Recording: The drone hovers at designated vantage points, recording continuous video frames .
Geo-referenced 3D Scene Reconstruction leverages Structure-from-Motion (SfM) and Multiview Stereo (MVS):
- SIFT features are extracted, and an initial extrinsic estimate is solved:
- Joint bundle adjustment is conducted over intrinsics, extrinsics, and 3D points, incorporating GPS priors via minimization of:
where is the pinhole projection, is the camera center, is GPS in local UTM, and weights GPS alignment versus reprojection.
- Orthophoto rendering and semantic segmentation isolate the road surface, which is modeled as a NURBS mesh (using FlexRoad).
HD Map Creation exports the road network and elevation in OpenDRIVE format. Frame calibration employs learned matchers (LoFTR, LightGlue) and solves a PnP-style robust optimization:
with temporal smoothing via a Kalman filter.
2. Monocular 3D Object Detection, Refinement, and 6-DoF Tracking
Each video frame undergoes a monocular ground-aware 3D detection process (GroundMix):
- Predict 2D bounding box, class, continuous 3D box dimensions , estimated depth , orientation , and projected ground-center pixel .
- Back-project ground-center using:
.
- Ground-aware refinement determines object position by intersecting the camera ray with the ground mesh ().
- Orientation is decomposed into and re-aligned with the ground normal, yielding .
- World-frame coordinates are computed as:
- Objects are linked across frames via a Kalman filter-based tracker (state: position+velocity) and refined using an RTS smoother, providing temporally continuous, uniquely labeled 6-DoF bounding-box trajectories.
6-DoF Bounding-Box Parameterization:
Each object state at time comprises center position , orientation (Euler angles ), and dimensions . Transformations between world, camera, and image coordinates follow the pinhole model and standard projection/back-projection equations.
3. Dataset Content, Locations, and Diversity
DSC3D offers 15 hours of video sequences, totaling approximately 175,000 unique trajectories of 14 traffic participant classes, with 5,395 km of total path length.
Participant and Trajectory Counts:
| Class | Trajectories |
|---|---|
| Pedestrian | 140,227 |
| Bicycle | 17,736 |
| Car | 13,241 |
| Scooter | 1,475 |
| Motorcycle | 1,054 |
| Animal | 677 |
| Truck | 475 |
| Bus | 191 |
| Other | 2,075 |
Alongside these, five further subtypes yield 14 classes.
Captured Locations:
- DSC-SIFI: Parking lot
- DSC-MUC: Crowded inner-city, high pedestrian density
- DSC-STR: Unsignalized T-intersection
- DSC-BER: Federal highway (B-roads, 50 km/h)
- DSC-SFO: Steep, unsignalized suburban intersection
Diversity dimensions include two countries (Germany, USA), five scene types, and 14 participant classes. No closed-form diversity index is defined; variation is by these reported axes.
4. Annotation Schemas and Data Organization
Data are structured for immediate usability in academic workflows, distributed under a standardized directory format:
1 2 3 4 5 6 7 |
locations/ ├─ DSC-MUC/ │ ├─ map/ # OpenDRIVE + mesh │ ├─ video.mp4 │ └─ trajectories.csv ├─ DSC-SIFI/ └─ ... |
CSV Annotation Schema:
| Field | Type | Example |
|---|---|---|
| frame_id | int | 512 |
| timestamp | float (s) | 20.48 |
| track_id | int | 17 |
| class_id | int | 2 |
| class_name | string | Pedestrian |
| x, y, z | float (m) | 683100.23, 5292001.45, 3.51 (world UTM) |
| vx,vy,vz | float (m/s) | 0.12, 0.02, 0.00 |
| ax,ay,az | float (m/s²) | 0.01, 0.00, 0.00 |
| qx,qy,qz,qw | float | 0.00, 0.00, 0.00, 1.0 (unit quaternion) |
| l,w,h | float (m) | 0.5, 0.5, 1.75 |
JSON Schema: Each scene provides per-track files with class, bounding box dimensions, and temporal states (frame, time, position, velocity, orientation quaternion).
5. Evaluated Applications and Benchmarked Results
Motion Prediction & Planning:
On the DeepUrban benchmark, 20 s scenarios from DSC-MUC, SIFI, STR, and SFO were used. Models trained by augmenting NuScenes with DSC3D improved Average Displacement Error (ADE) and Final Displacement Error (FDE) by 44.1% and 44.3%, respectively.
Human Driving Safety Compliance:
Evaluated gap-distance, time-to-collision (TTC), and post-encroachment time (PET). Velocity and acceleration signals in DSC3D were found to be more realistic and less noisy than those in Lyft or Argoverse2 datasets.
Scenario Mining:
For comprehensive parking maneuvers (DSC-SIFI), 80% completed in under 10 s, while 90% required no more than two direction switches. In critical intersection scenarios (DSC-STR/SFO), TTC distributions peak at 2–4 s and PET at 1–2 s, supporting identification of near-collision events.
Generative Reactive Traffic Agents:
State-of-the-art models (BehaviorGPT, Versatile Behavior Diffusion, TrafficBots v1.5) trained on DSC-STR and SFO learn interactive, multi-agent, realistic traffic scenarios consistent with the empirical dynamics.
6. Data Access, Visualization, and Integration
Scenes can be browsed, filtered, and downloaded at https://app.deepscenario.com, with further documentation at https://deepscenario.github.io/DSC3D/. Python integration enables direct workflow adoption:
1 2 3 4 5 |
from deepscenario import Dataset ds = Dataset.load("DSC3D") scene = ds.get_scene("DSC-MUC") trajs = scene.load_trajectories() # returns pandas.DataFrame scene.visualize_3d(save_as="muc.html") # launches WebGL viewer |
The web application includes an interactive visualization platform with 3D mesh, HD map overlays, bounding-box animations, and 2D/3D annotation display. This supports reproducible research and efficient dataset utilization for motion prediction, behavior modeling, and safety validation (Dhaouadi et al., 24 Apr 2025).