ParaDrive: Supervised Lateral Driving Data
- ParaDrive is a large-scale driving video dataset that provides explicit paired supervision for lateral trajectory synthesis using a 3D Gaussian Splatting curation pipeline.
- It comprises roughly 1,600 scenes with over 110,000 paired video samples derived from Waymo and NuScenes, ensuring standardized multi-trajectory training.
- The dataset supports robust evaluation with metrics for imaging quality, camera accuracy, and view consistency, advancing research in novel view synthesis and autonomous driving.
ParaDrive is a large-scale, camera-centric driving video dataset designed to enable fully supervised, camera-controlled novel-trajectory generation from monocular data. Unlike prior approaches reliant on LiDAR or self-supervised viewpoint interpolation, ParaDrive provides explicit paired supervision for generating laterally shifted views, grounded in a 3D Gaussian Splatting (3DGS)–based curation pipeline. The dataset is constructed from approximately 1,600 scenes sampled from the Waymo Open Dataset (WOD) v1.4.3-train and NuScenes v1.0-train, yielding over 110,000 paired trajectory video samples and standardizing multi-trajectory supervision in urban and highway settings. ParaDrive targets research in computer vision, novel view synthesis, camera trajectory control, and generative modeling for autonomous driving environments (Li et al., 3 Dec 2025).
1. Data Construction and Curation Pipeline
The ParaDrive construction process is fundamentally rooted in 3DGS @@@@3@@@@ and explicit cross-trajectory rendering. For each of the ≈1,600 original driving scenes (≈800 from WOD and ≈800 from NuScenes), monocular video is first reconstructed into a 3DGS model using DriveStudio. Model convergence is operationally defined at 30k optimizer iterations, yielding high-fidelity scene geometry; underfitted intermediate models are saved at 100, 500, and 1,000 iterations to support curation with varying photometric artifacts.
For each converged 3DGS scene, synthetic camera trajectories are rendered along eight lateral offsets δ ∈ {±1 m, ±2 m, ±3 m, ±4 m} in addition to the original path, producing eight novel-trajectory (laterally shifted) video clips per scene. Underfitted 3DGS models are also rendered along the original trajectory, generating “camera-condition” source clips whose degradation level matches that of the offset renders. The clean, recorded monocular video along the original path provides target supervision. This “cross-trajectory” curation strategy ensures consistent, parallel pairs for training and closes the train–test gap in camera transformation patterns (with both training and evaluation restricted to lateral translations, rather than the front/back segmentation common in previous protocols).
Each parallel-trajectory pair consists of (a) the laterally shifted, 3DGS-rendered input and underfitted camera-condition render, (b) the clean monocular sequence as ground truth, and (c) exact extrinsics/intrinsics for both source and target trajectories (Li et al., 3 Dec 2025).
2. Dataset Statistics, Content, and Structure
ParaDrive includes approximately 1,600 scenes, with the following combinatorial breakdown:
- Offsets: 8 lateral displacements per scene (δ ∈ {±1, ±2, ±3, ±4} m)
- Underfitted Iteration Conditions: 3 (100, 500, 1,000 iterations)
- Clips per Trajectory: 3 per scene (“front”, “middle”, “rear”)
- Frames per Clip: 121
This yields an aggregate of ≈115,200 paired parallel-trajectory video samples and a total frame count per split on the order of 27.8 million.
Data Partitioning
| Split | Scenes | Offsets | Iterations | Clips/Scene | Pairs | Frames/Pair | Total Frames |
|---|---|---|---|---|---|---|---|
| Train | ~1,560 | 8 | 3 | 3 | 110,000 | 2×121 | ~26.6 million |
| Val (WOD) | 20 | 8 | 3 | 3 | 1,440 | 2×121 | ~349,000 |
| Val (NuS) | 20 | 8 | 3 | 3 | 1,440 | 2×121 | ~349,000 |
Each dataset instance contains RGB frames (PNG or JPG) for source and target, corresponding 4×4 camera extrinsics (T_s(t), T_t(t)), camera intrinsics K (3×3 calibration), 3DGS-rendered offset input videos, and joint metadata (offset δ, clip assignment, 3DGS iteration/stage, and timestamps) (Li et al., 3 Dec 2025).
3. Annotation Schema and Metadata
Every ParaDrive paired sample is annotated with:
- Explicit video source indices and δ offset label
- Per-frame timestamps (Unix time or drive-time)
- 3D camera intrinsics (focal length, principal point, etc.)
- Source and target camera extrinsics as homogeneous 4×4 matrices (T_s(t), T_t(t)) per frame
- 3DGS model iteration ID (100, 500, 1,000, or 30k) for tracking rendering quality
- Clip meta-label (“front”, “middle”, “rear” segment of the full trajectory)
Relative pose for each frame is formalized as ΔT(t) = T_t(t) T_s(t){-1} ∈ SE(3). During model training, ΔT is provided to the generative model (encoding pure lateral translation); at inference, the same ΔT class is used, ensuring statistical alignment between training and deployment (Li et al., 3 Dec 2025).
4. Evaluation Benchmarks and Metrics
Standard evaluation metrics supported in ParaDrive are:
- Imaging Quality (IQ↑): VBench fidelity score for appearance realism
- CLIP-F (↑): CLIP feature similarity between adjacent frames
- Camera Accuracy: Rotation Error (RErr, degrees) and Translation Error (TErr, meters), computed by MegaSaM
- View Consistency: Frechet Inception Distance (FID↓), Frechet Video Distance (FVD↓), and video-level CLIP similarity (CLIP-V↑)
These benchmarks, provided with evaluation scripts, allow reproducible comparisons across models and facilitate ablation of camera trajectory or rendering fidelity effects.
5. Access and Tooling
ParaDrive is released under an academic research license, available at https://recamdriving.github.io. Access to the full data requires registration and license acceptance. Downloads are delivered as per-scene archives, each comprising:
- Raw RGB frame folders (“/frames/source/”, “/frames/target/”)
- Camera calibration and trajectory matrices (“cameras.npz”)
- 3DGS renderings (offset condition and camera-condition variants)
- Metadata files (“meta.json”) with δ, clip boundaries, and 3DGS iteration identifiers
Support tooling includes a PyTorch Dataset and Dataloader implementation, preprocessing utilities (e.g., conversion from WOD tfrecords to PNG+JSON), and a comprehensive metrics evaluation pipeline for IQ, CLIP-F, RErr, TErr, FID, FVD, and CLIP-V. Example Jupyter notebooks are provided for end-to-end experimentation (Li et al., 3 Dec 2025).
6. Research Applications and Relevance
ParaDrive enables investigation of camera-controlled novel-trajectory generation with explicit lateral motion supervision—unlike prior datasets limited to front/back splits or requiring LiDAR supervision. Concrete applications include:
- Training diffusion or generative models for camera trajectory synthesis with closed train/test camera-motion alignment
- Evaluating methods for explicit, geometry-conditioned scene rendering in urban and highway domains
- Studying the impact of cross-trajectory curation and 3DGS rendering quality on model controllability and realism
The cross-trajectory strategy, detailed annotation, and standard evaluation suite position ParaDrive as a reference dataset for structured monocular video generation and view synthesis tasks in autonomous driving contexts. Its exclusive use of monocular video and full lateral-shift supervision distinguish it from datasets with only longitudinal splits or implicit viewpoint conditioning (Li et al., 3 Dec 2025).