Papers
Topics
Authors
Recent
2000 character limit reached

DSC3D: 3D Dataset for Autonomous Driving

Updated 23 December 2025
  • DeepScenario Open 3D Dataset (DSC3D) is an occlusion-free, high-fidelity dataset providing 6-DoF trajectories for diverse traffic participants in various environments.
  • It utilizes a monocular drone-based capture and advanced 3D reconstruction techniques (SfM and MVS) to generate precise, geo-referenced data and HD maps.
  • The dataset supports autonomous driving research by enhancing motion prediction, safety validation, and realistic multi-agent behavior modeling.

The DeepScenario Open 3D Dataset (DSC3D) is an occlusion-free, high-fidelity dataset of 6-DoF bounding box trajectories for diverse traffic participants in urban and non-urban environments. Utilizing a monocular drone-based capture and an advanced end-to-end reconstruction pipeline, DSC3D consists of over 175,000 unique, precisely geo-referenced trajectories across five heterogeneous locations in Europe and the United States. The dataset is designed to advance research in autonomous driving by providing large-scale, detailed 3D motion and interaction data, with comprehensive annotations and a robust online visualization and access interface (Dhaouadi et al., 24 Apr 2025).

1. Monocular Drone-Based Data Collection and 3D Scene Reconstruction

The DSC3D collection pipeline employs commercial DJI quadcopters fitted with downward-tilted (\sim0–30^\circ) stabilized RGB cameras streaming at 25 Hz. The data capture follows a two-pass flight protocol:

(A) Mapping Flight: The drone traverses the scene, capturing NN geo-tagged still images {(Ii,gIi)}i=1N\{(I_i, g^{I_i})\}_{i=1}^N, where GPS is recorded in WGS84 coordinates.

(B) Static Recording: The drone hovers at designated vantage points, recording continuous video frames {(Ft,gFt)}t=1T\{(F_t, g^{F_t})\}_{t=1}^T.

Geo-referenced 3D Scene Reconstruction leverages Structure-from-Motion (SfM) and Multiview Stereo (MVS):

  • SIFT features are extracted, and an initial extrinsic estimate is solved:

TinitIi=[RinitIi    tinitIi]T^{I_i}_\text{init} = [R^{I_i}_\text{init} \;\; t^{I_i}_\text{init}]

  • Joint bundle adjustment is conducted over intrinsics, extrinsics, and 3D points, incorporating GPS priors via minimization of:

L=i,jπ(K,TIi,Xj)xij2+λicIiglocalIi2,\mathcal{L} = \sum_{i,j}\|\pi(K, T^{I_i}, X_j) - x_{ij}\|^2 + \lambda \sum_i \|c^{I_i} - g^{I_i}_\text{local}\|^2,

where π\pi is the pinhole projection, cIic^{I_i} is the camera center, glocalIig^{I_i}_\text{local} is GPS in local UTM, and λ\lambda weights GPS alignment versus reprojection.

  • Orthophoto rendering and semantic segmentation isolate the road surface, which is modeled as a NURBS mesh (using FlexRoad).

HD Map Creation exports the road network and elevation in OpenDRIVE format. Frame calibration employs learned matchers (LoFTR, LightGlue) and solves a PnP-style robust optimization:

(TFt,K)=argminTFt,Kiρ(π(K,TFt,Xi)xiFt),(T^{F_t*}, K^*) = \arg\min_{T^{F_t}, K} \sum_i \rho(\pi(K, T^{F_t}, X_i) - x_i^{F_t}),

with temporal smoothing via a Kalman filter.

2. Monocular 3D Object Detection, Refinement, and 6-DoF Tracking

Each video frame undergoes a monocular ground-aware 3D detection process (GroundMix):

  • Predict 2D bounding box, class, continuous 3D box dimensions [l,w,h][l, w, h], estimated depth ZcZ_c, orientation Rc\mathbf{R}_c, and projected ground-center pixel xp\mathbf{x}_p.
  • Back-project ground-center using:

Xc=ZcK1xp^,\mathbf{X}_c = Z_c\,K^{-1}\,\hat{\mathbf{x}_p},

xp^=(u,v,1)\hat{\mathbf{x}_p} = (u, v, 1)^\top.

  • Ground-aware refinement determines object position by intersecting the camera ray with the ground mesh (Xc\mathbf{X}_c^*).
  • Orientation Rc\mathbf{R}_c is decomposed into RZ(ψ)RY(θ)RX(ϕ)R_Z(\psi)\,R_Y(\theta)\,R_X(\phi) and re-aligned with the ground normal, yielding Rc=RZ(ψ)RY(ω)RX(ϕ)\mathbf{R}_c^* = R_Z(\psi)\,R_Y(\omega)\,R_X(\phi).
  • World-frame coordinates are computed as:

Xw=RFtXc+tFt,Rw=RFtRc\mathbf{X}_w = R^{F_t} \mathbf{X}_c^* + t^{F_t}, \quad \mathbf{R}_w = R^{F_t} \mathbf{R}_c^*

  • Objects are linked across frames via a Kalman filter-based tracker (state: position+velocity) and refined using an RTS smoother, providing temporally continuous, uniquely labeled 6-DoF bounding-box trajectories.

6-DoF Bounding-Box Parameterization:

Each object state at time tt comprises center position p=(x,y,z)p = (x, y, z)^\top, orientation RSO(3)R \in SO(3) (Euler angles ϕ,θ,ψ\phi, \theta, \psi), and dimensions (l,w,h)(l, w, h). Transformations between world, camera, and image coordinates follow the pinhole model and standard projection/back-projection equations.

3. Dataset Content, Locations, and Diversity

DSC3D offers 15 hours of video sequences, totaling approximately 175,000 unique trajectories of 14 traffic participant classes, with 5,395 km of total path length.

Participant and Trajectory Counts:

Class Trajectories
Pedestrian 140,227
Bicycle 17,736
Car 13,241
Scooter 1,475
Motorcycle 1,054
Animal 677
Truck 475
Bus 191
Other 2,075

Alongside these, five further subtypes yield 14 classes.

Captured Locations:

  • DSC-SIFI: Parking lot
  • DSC-MUC: Crowded inner-city, high pedestrian density
  • DSC-STR: Unsignalized T-intersection
  • DSC-BER: Federal highway (B-roads, 50 km/h)
  • DSC-SFO: Steep, unsignalized suburban intersection

Diversity dimensions include two countries (Germany, USA), five scene types, and 14 participant classes. No closed-form diversity index is defined; variation is by these reported axes.

4. Annotation Schemas and Data Organization

Data are structured for immediate usability in academic workflows, distributed under a standardized directory format:

1
2
3
4
5
6
7
locations/
 ├─ DSC-MUC/
 │    ├─ map/            # OpenDRIVE + mesh
 │    ├─ video.mp4
 │    └─ trajectories.csv
 ├─ DSC-SIFI/
 └─ ...

CSV Annotation Schema:

Field Type Example
frame_id int 512
timestamp float (s) 20.48
track_id int 17
class_id int 2
class_name string Pedestrian
x, y, z float (m) 683100.23, 5292001.45, 3.51 (world UTM)
vx,vy,vz float (m/s) 0.12, 0.02, 0.00
ax,ay,az float (m/s²) 0.01, 0.00, 0.00
qx,qy,qz,qw float 0.00, 0.00, 0.00, 1.0 (unit quaternion)
l,w,h float (m) 0.5, 0.5, 1.75

JSON Schema: Each scene provides per-track files with class, bounding box dimensions, and temporal states (frame, time, position, velocity, orientation quaternion).

5. Evaluated Applications and Benchmarked Results

Motion Prediction & Planning:

On the DeepUrban benchmark, 20 s scenarios from DSC-MUC, SIFI, STR, and SFO were used. Models trained by augmenting NuScenes with DSC3D improved Average Displacement Error (ADE) and Final Displacement Error (FDE) by 44.1% and 44.3%, respectively.

Human Driving Safety Compliance:

Evaluated gap-distance, time-to-collision (TTC), and post-encroachment time (PET). Velocity and acceleration signals in DSC3D were found to be more realistic and less noisy than those in Lyft or Argoverse2 datasets.

Scenario Mining:

For comprehensive parking maneuvers (DSC-SIFI), 80% completed in under 10 s, while 90% required no more than two direction switches. In critical intersection scenarios (DSC-STR/SFO), TTC distributions peak at 2–4 s and PET at 1–2 s, supporting identification of near-collision events.

Generative Reactive Traffic Agents:

State-of-the-art models (BehaviorGPT, Versatile Behavior Diffusion, TrafficBots v1.5) trained on DSC-STR and SFO learn interactive, multi-agent, realistic traffic scenarios consistent with the empirical dynamics.

6. Data Access, Visualization, and Integration

Scenes can be browsed, filtered, and downloaded at https://app.deepscenario.com, with further documentation at https://deepscenario.github.io/DSC3D/. Python integration enables direct workflow adoption:

1
2
3
4
5
from deepscenario import Dataset
ds = Dataset.load("DSC3D")
scene = ds.get_scene("DSC-MUC")
trajs = scene.load_trajectories()  # returns pandas.DataFrame
scene.visualize_3d(save_as="muc.html")  # launches WebGL viewer

The web application includes an interactive visualization platform with 3D mesh, HD map overlays, bounding-box animations, and 2D/3D annotation display. This supports reproducible research and efficient dataset utilization for motion prediction, behavior modeling, and safety validation (Dhaouadi et al., 24 Apr 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to DeepScenario Open 3D Dataset (DSC3D).

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube