DSC3D: 3D Dataset for Autonomous Driving

Updated 23 December 2025

DeepScenario Open 3D Dataset (DSC3D) is an occlusion-free, high-fidelity dataset providing 6-DoF trajectories for diverse traffic participants in various environments.
It utilizes a monocular drone-based capture and advanced 3D reconstruction techniques (SfM and MVS) to generate precise, geo-referenced data and HD maps.
The dataset supports autonomous driving research by enhancing motion prediction, safety validation, and realistic multi-agent behavior modeling.

The DeepScenario Open 3D Dataset (DSC3D) is an occlusion-free, high-fidelity dataset of 6-DoF bounding box trajectories for diverse traffic participants in urban and non-urban environments. Utilizing a monocular drone-based capture and an advanced end-to-end reconstruction pipeline, DSC3D consists of over 175,000 unique, precisely geo-referenced trajectories across five heterogeneous locations in Europe and the United States. The dataset is designed to advance research in autonomous driving by providing large-scale, detailed 3D motion and interaction data, with comprehensive annotations and a robust online visualization and access interface (Dhaouadi et al., 24 Apr 2025).

1. Monocular Drone-Based Data Collection and 3D Scene Reconstruction

The DSC3D collection pipeline employs commercial DJI quadcopters fitted with downward-tilted ( $\sim$ 0–30 $^\circ$ ) stabilized RGB cameras streaming at 25 Hz. The data capture follows a two-pass flight protocol:

(A) Mapping Flight: The drone traverses the scene, capturing $N$ geo-tagged still images $\{(I_i, g^{I_i})\}_{i=1}^N$ , where GPS is recorded in WGS84 coordinates.

(B) Static Recording: The drone hovers at designated vantage points, recording continuous video frames $\{(F_t, g^{F_t})\}_{t=1}^T$ .

Geo-referenced 3D Scene Reconstruction leverages Structure-from-Motion (SfM) and Multiview Stereo (MVS):

SIFT features are extracted, and an initial extrinsic estimate is solved:

$T^{I_i}_\text{init} = [R^{I_i}_\text{init} \;\; t^{I_i}_\text{init}]$

Joint bundle adjustment is conducted over intrinsics, extrinsics, and 3D points, incorporating GPS priors via minimization of:

$\mathcal{L} = \sum_{i,j}\|\pi(K, T^{I_i}, X_j) - x_{ij}\|^2 + \lambda \sum_i \|c^{I_i} - g^{I_i}_\text{local}\|^2,$

where $\pi$ is the pinhole projection, $c^{I_i}$ is the camera center, $g^{I_i}_\text{local}$ is GPS in local UTM, and $\lambda$ weights GPS alignment versus reprojection.

Orthophoto rendering and semantic segmentation isolate the road surface, which is modeled as a NURBS mesh (using FlexRoad).

HD Map Creation exports the road network and elevation in OpenDRIVE format. Frame calibration employs learned matchers (LoFTR, LightGlue) and solves a PnP-style robust optimization:

$(T^{F_t*}, K^*) = \arg\min_{T^{F_t}, K} \sum_i \rho(\pi(K, T^{F_t}, X_i) - x_i^{F_t}),$

with temporal smoothing via a Kalman filter.

Each video frame undergoes a monocular ground-aware 3D detection process (GroundMix):

Predict 2D bounding box, class, continuous 3D box dimensions $[l, w, h]$ , estimated depth $Z_c$ , orientation $\mathbf{R}_c$ , and projected ground-center pixel $\mathbf{x}_p$ .
Back-project ground-center using:

$\mathbf{X}_c = Z_c\,K^{-1}\,\hat{\mathbf{x}_p},$

$\hat{\mathbf{x}_p} = (u, v, 1)^\top$ .

Ground-aware refinement determines object position by intersecting the camera ray with the ground mesh ( $\mathbf{X}_c^*$ ).
Orientation $\mathbf{R}_c$ is decomposed into $R_Z(\psi)\,R_Y(\theta)\,R_X(\phi)$ and re-aligned with the ground normal, yielding $\mathbf{R}_c^* = R_Z(\psi)\,R_Y(\omega)\,R_X(\phi)$ .
World-frame coordinates are computed as:

$\mathbf{X}_w = R^{F_t} \mathbf{X}_c^* + t^{F_t}, \quad \mathbf{R}_w = R^{F_t} \mathbf{R}_c^*$

Objects are linked across frames via a Kalman filter-based tracker (state: position+velocity) and refined using an RTS smoother, providing temporally continuous, uniquely labeled 6-DoF bounding-box trajectories.

6-DoF Bounding-Box Parameterization:

Each object state at time $t$ comprises center position $p = (x, y, z)^\top$ , orientation $R \in SO(3)$ (Euler angles $\phi, \theta, \psi$ ), and dimensions $(l, w, h)$ . Transformations between world, camera, and image coordinates follow the pinhole model and standard projection/back-projection equations.

3. Dataset Content, Locations, and Diversity

DSC3D offers 15 hours of video sequences, totaling approximately 175,000 unique trajectories of 14 traffic participant classes, with 5,395 km of total path length.

Participant and Trajectory Counts:

Class	Trajectories
Pedestrian	140,227
Bicycle	17,736
Car	13,241
Scooter	1,475
Motorcycle	1,054
Animal	677
Truck	475
Bus	191
Other	2,075

Alongside these, five further subtypes yield 14 classes.

Captured Locations:

DSC-SIFI: Parking lot
DSC-MUC: Crowded inner-city, high pedestrian density
DSC-STR: Unsignalized T-intersection
DSC-BER: Federal highway (B-roads, 50 km/h)
DSC-SFO: Steep, unsignalized suburban intersection

Diversity dimensions include two countries (Germany, USA), five scene types, and 14 participant classes. No closed-form diversity index is defined; variation is by these reported axes.

4. Annotation Schemas and Data Organization

Data are structured for immediate usability in academic workflows, distributed under a standardized directory format:

locations/
 ├─ DSC-MUC/
 │    ├─ map/            # OpenDRIVE + mesh
 │    ├─ video.mp4
 │    └─ trajectories.csv
 ├─ DSC-SIFI/
 └─ ...

CSV Annotation Schema:

Field	Type	Example
frame_id	int	512
timestamp	float (s)	20.48
track_id	int	17
class_id	int	2
class_name	string	Pedestrian
x, y, z	float (m)	683100.23, 5292001.45, 3.51 (world UTM)
vx,vy,vz	float (m/s)	0.12, 0.02, 0.00
ax,ay,az	float (m/s²)	0.01, 0.00, 0.00
qx,qy,qz,qw	float	0.00, 0.00, 0.00, 1.0 (unit quaternion)
l,w,h	float (m)	0.5, 0.5, 1.75

JSON Schema: Each scene provides per-track files with class, bounding box dimensions, and temporal states (frame, time, position, velocity, orientation quaternion).

5. Evaluated Applications and Benchmarked Results

Motion Prediction & Planning:

On the DeepUrban benchmark, 20 s scenarios from DSC-MUC, SIFI, STR, and SFO were used. Models trained by augmenting NuScenes with DSC3D improved Average Displacement Error (ADE) and Final Displacement Error (FDE) by 44.1% and 44.3%, respectively.

Human Driving Safety Compliance:

Evaluated gap-distance, time-to-collision (TTC), and post-encroachment time (PET). Velocity and acceleration signals in DSC3D were found to be more realistic and less noisy than those in Lyft or Argoverse2 datasets.

Scenario Mining:

For comprehensive parking maneuvers (DSC-SIFI), 80% completed in under 10 s, while 90% required no more than two direction switches. In critical intersection scenarios (DSC-STR/SFO), TTC distributions peak at 2–4 s and PET at 1–2 s, supporting identification of near-collision events.

Generative Reactive Traffic Agents:

State-of-the-art models (BehaviorGPT, Versatile Behavior Diffusion, TrafficBots v1.5) trained on DSC-STR and SFO learn interactive, multi-agent, realistic traffic scenarios consistent with the empirical dynamics.

6. Data Access, Visualization, and Integration

Scenes can be browsed, filtered, and downloaded at https://app.deepscenario.com, with further documentation at https://deepscenario.github.io/DSC3D/. Python integration enables direct workflow adoption:

from deepscenario import Dataset
ds = Dataset.load("DSC3D")
scene = ds.get_scene("DSC-MUC")
trajs = scene.load_trajectories()  # returns pandas.DataFrame
scene.visualize_3d(save_as="muc.html")  # launches WebGL viewer

The web application includes an interactive visualization platform with 3D mesh, HD map overlays, bounding-box animations, and 2D/3D annotation display. This supports reproducible research and efficient dataset utilization for motion prediction, behavior modeling, and safety validation (Dhaouadi et al., 24 Apr 2025).

PDF Markdown Chat (Pro)

References (1)

Highly Accurate and Diverse Traffic Data: The DeepScenario Open 3D Dataset (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to DeepScenario Open 3D Dataset (DSC3D).

DSC3D: 3D Dataset for Autonomous Driving

1. Monocular Drone-Based Data Collection and 3D Scene Reconstruction

2. Monocular 3D Object Detection, Refinement, and 6-DoF Tracking

3. Dataset Content, Locations, and Diversity

4. Annotation Schemas and Data Organization

5. Evaluated Applications and Benchmarked Results

6. Data Access, Visualization, and Integration

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

DSC3D: 3D Dataset for Autonomous Driving

1. Monocular Drone-Based Data Collection and 3D Scene Reconstruction

2. Monocular 3D Object Detection, Refinement, and 6-DoF Tracking

3. Dataset Content, Locations, and Diversity

4. Annotation Schemas and Data Organization

5. Evaluated Applications and Benchmarked Results

6. Data Access, Visualization, and Integration

Sponsor

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research