Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 73 tok/s
Gemini 2.5 Pro 55 tok/s Pro
GPT-5 Medium 28 tok/s Pro
GPT-5 High 29 tok/s Pro
GPT-4o 95 tok/s Pro
Kimi K2 202 tok/s Pro
GPT OSS 120B 455 tok/s Pro
Claude Sonnet 4.5 34 tok/s Pro
2000 character limit reached

nuScenes: Autonomous Driving Benchmark

Updated 5 October 2025
  • nuScenes is a large-scale, multimodal benchmark dataset for autonomous driving, featuring 360° sensor coverage and detailed 3D annotations in urban environments.
  • It integrates synchronized data from cameras, LiDAR, radar, and GPS/IMU, enabling the use of novel evaluation metrics for 3D detection and tracking.
  • The dataset supports advanced research in sensor fusion, class imbalance modeling, and robust prediction, setting new standards in autonomous vehicle perception.

nuScenes Dataset

nuScenes is a large-scale, multimodal benchmark for autonomous driving perception and prediction, providing a comprehensive sensor suite, extensive annotations, novel performance metrics, and serving as a standard for research in 3D detection, tracking, and sensor fusion. The dataset was collected in dense urban environments of Boston and Singapore and is distinguished by its complete 360° sensor coverage, dense 3D annotations for both objects and their states, and public releases of a development kit, evaluation code, and standardized protocols (Caesar et al., 2019).

1. Sensor Configuration and Synchronization

nuScenes is the first autonomous driving dataset to capture the full sensor suite typically mounted on a research vehicle, enabling holistic perception capability:

Modality Sensor Details
Camera 6 × 70° (front, sides), 1 × 110° (rear), synchronized 360° RGB views
LiDAR 1 × 32-beam spinning Velodyne at 20 Hz, 360° horizontal, –30° to +10° vertical FOV, ~70 m range
Radar 5 × 77 GHz Continental ARS408 FMCW, 13 Hz, up to 250 m, ±0.1 km/h velocity accuracy
Localization GPS/IMU at 1000 Hz, RTK, 20 mm accuracy

Precise temporal synchronization is a defining feature: camera exposures are triggered when the LiDAR’s top sweep is at the center of the camera field of view. This results in tightly aligned multimodal sensor data, addressing calibration and registration problems encountered in prior datasets.

2. Dataset Structure, Size, and Annotation Schema

nuScenes comprises 1,000 driving scenes, each 20 seconds long, for a total of 5.5 hours of data, sampled across Boston and Singapore urban areas. The annotated core ("keyframes") is at 2 Hz, representing the following statistics:

  • ~1.4 million images (~100× KITTI)
  • ~400,000 LiDAR point clouds, ~1.3 million radar sweeps
  • Full 360° annotation coverage—not limited to the forward view as in KITTI

Each keyframe is richly annotated with:

  • 3D bounding boxes for 23 semantic classes (including both prevalent and rare types, such as construction vehicles, trailers, traffic cones)
  • 8 object attributes (e.g., pedestrian pose, vehicle state)
  • Full geometric box description: (x,y,z)(x, y, z), width, length, height, and yaw angle This scale yields ≈7× more annotations than KITTI, and the dataset exhibits strong class imbalance (1:10,000 between rarest and commonest classes), motivating research on long-tail distribution modeling.

3. Benchmarking: Novel Detection, Tracking Metrics, and Baselines

nuScenes introduced evaluation metrics tailored to the challenges of 3D detection and tracking in safety-critical, sensor-fusion scenarios:

Detection

  • Class-agnostic, center-distance-based average precision (AP): instead of intersection-over-union (IoU), matches between predicted and ground-truth boxes are determined by the 2D center distance on the ground, with thresholds D={0.5,1,2,4}D = \{0.5, 1, 2, 4\} m over classes CC.
  • mAP=1CDcCdDAPc,dmAP = \frac{1}{|C||D|} \sum_{c \in C}\sum_{d \in D} AP_{c,d}
  • nuScenes Detection Score (NDS): captures overall quality, balancing AP and five TP error metrics:
    • Average Translation Error (ATE, meters)
    • Average Scale Error (ASE, 1IoU1{-}\text{IoU})
    • Average Orientation Error (AOE, radians)
    • Average Velocity Error (AVE, m/s)
    • Average Attribute Error (AAE, 1accuracy1-{\rm accuracy})
    • NDS=110[5mAP+tp(1min(1,tp))]{\text{NDS}} = \frac{1}{10}\left[ 5\cdot mAP + \sum_{tp} (1-\min(1,tp)) \right]
    • (this summation is over the five error metrics above)

Tracking

  • sAMOTA (scaled MOT accuracy), sMOTAr_r (confidence-calibrated MOTA)
  • Track Initialization Duration (TID): time for tracker to begin tracking a target
  • Longest Gap Duration (LGD): maximum tracking failure interval per target

Baselines

  • LiDAR: PointPillars with temporal accumulation (10 sweeps), velocity regression
  • Image: Orthographic Feature Transform (OFT, with SSD head), MonoDIS
  • Tracking: Adapted AB3DMOT for both LiDAR and image-based detection

4. Dataset Analysis, Spatial Coverage, and Class Imbalance

Spatial and statistical analyses reveal:

  • Intersections are overrepresented (reflecting real-world conflict points)
  • On average, each keyframe contains ≈7 pedestrians and ≈20 vehicles, but strong long-tail distributions persist across rare classes
  • Distributional histograms for box sizes, spatial positions, and yaw demonstrate pronounced diversity in object geometry and scenario complexity
  • Experiments confirm that a center-distance match threshold of 2 m yields more informative cross-modality (LiDAR vs. image) rankings than IoU, addressing the well-known small-object penalization of IoU for near-miss errors
  • Accumulating additional LiDAR sweeps yields measurable improvements in both detection AP and velocity estimation, especially for dynamically moving objects

5. Comparison with Legacy Datasets

nuScenes represents an order-of-magnitude advance over previous datasets:

Dataset # Images # 3D Boxes Sensor Coverage Radar Environment Diversity
KITTI ~15k ~200k Front only No Limited, daylight
nuScenes ~1.4M ~1.4M 360°, full suite Yes Urban, Boston, SG; day/night, rain
  • nuScenes is the first to offer radar, 360° multi-modality, and urban diversity under varied environmental conditions.

6. Research Applications and Implications

nuScenes enables:

  • Advanced benchmarks for 3D object detection, tracking, trajectory prediction, and sensor fusion—within a realistic, fused-sensor urban context
  • Design and evaluation of algorithms under severe class imbalance, supporting progress in rare-event detection and long-tail modeling
  • Study of sensor synchronization and calibration, critical for robust perception in real-world deployments
  • Exploration of robustness to geographic and environmental domain shifts, with standardized evaluation practices

Semantic maps and human-authored scene descriptions are included, supporting research in semantic localization, behavior modeling, and prior-based scene understanding. Open-sourced tooling and code facilitate reproduction and comparability.

7. Impact and Future Directions

nuScenes, through its multimodal design, exhaustive annotations, tailored metrics, and open protocols, has established itself as a foundational benchmark for perception in autonomous urban driving. Its structure directly addresses limitations seen in earlier efforts—such as insufficient sensor diversity, annotation sparsity, and narrow operational domains—and sets high standards for future dataset development in the field. The dataset has catalyzed research in class-imbalanced detection, robust tracking, and multi-sensor fusion, and continues to serve as the de facto evaluation bedrock for large-scale autonomous vehicle research.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to nuScenes Dataset.