nuScenes Detection Score (NDS)
- nuScenes Detection Score (NDS) is a unified metric that aggregates mAP and various error measures to assess 3D object detection and state estimation in autonomous driving.
- It integrates localization, scale, orientation, velocity, and attribute errors into a weighted score that highlights the impact of temporal modeling and precise state estimation.
- The design of NDS drives research by emphasizing improvements in velocity and orientation accuracy, thus guiding advancements in perception systems for urban driving scenarios.
The nuScenes Detection Score (NDS) is a scalar performance metric specifically designed to evaluate 3D object detection and state estimation within the nuScenes dataset, a widely-adopted multimodal benchmark for autonomous driving. NDS addresses the unique requirements of 3D detection in urban driving scenes by integrating conventional detection performance with geometric, kinematic, and semantic property estimation, enabling unified assessment across detection, localization, orientation, velocity, and attribute accuracy (Caesar et al., 2019, Ji et al., 17 Apr 2025).
1. Formal Definition and Mathematical Structure
Let denote the set of object classes evaluated (10 in nuScenes), and m the set of center-distance thresholds for geometric matching in the horizontal plane. The NDS is defined as a weighted sum of mean Average Precision (mAP) and five normalized mean error metrics, as follows:
Here,
- : mean Average Precision over all classes and distance thresholds.
- : mean Average Translation Error.
- : mean Average Scale Error.
- : mean Average Orientation Error.
- : mean Average Velocity Error.
- : mean Average Attribute Error.
Each error metric is clamped so that ensures its contribution is limited to its respective sub-interval, promoting stability and interpretability (Caesar et al., 2019, Ji et al., 17 Apr 2025).
2. Metric Components and Calculation
Mean Average Precision (mAP)
- Detections are matched to ground truth based on distances between projected centers in the ground plane; 3D IoU scores are not used for primary matching.
- For each class and threshold , an AP value is computed, pruned for operating points with either precision or recall below 10%.
- The final mAP is averaged over all classes and thresholds:
True-Positive (TP) Error Metrics
For each matched detection (true positive at 2m threshold), individual error types are evaluated:
- Translation Error (ATE): (meters).
- Scale Error (ASE): after alignment.
- Orientation Error (AOE): (radians).
- Velocity Error (AVE): (meters/second).
- Attribute Error (AAE): (proportion misclassified).
Each metric is computed for every class and then averaged across classes (Caesar et al., 2019, Ji et al., 17 Apr 2025).
3. Relative Contributions and Interpretation
Metric weighting in the NDS formula provides a balanced scalar summary:
- is scaled by a factor of 5 ( of score).
- Each error term contributes .
- This structure rewards correct detections (mAP) and low aggregated errors; low error values (approaching zero) maximize the corresponding positive contribution .
This formulation allows NDS to distinguish models that have similar mAP but diverge in state estimation quality, such as velocity or orientation prediction fidelity (Caesar et al., 2019, Ji et al., 17 Apr 2025).
4. Design Rationale and Use Cases
NDS explicitly decouples and quantifies multiple error modalities:
- Localization: ATE captures 2D position accuracy.
- Shape: ASE reflects volumetric alignment.
- Heading: AOE measures orientation error.
- Dynamics: AVE quantifies mean velocity error.
- Semantic State: AAE penalizes incorrect attribute (such as parked vs. moving) assignments.
This aggregation provides a unified metric suitable for comparing detection systems beyond traditional IoU-based mAP, prioritizing not only detection but high-precision state estimation. For example, lidar-based methods can substantially outperform monocular systems on ATE and AVE, yielding a higher NDS even if mAP values are similar (Caesar et al., 2019).
5. Impact on Method Development and Benchmarking
NDS’s multi-faceted structure shapes both research targets and evaluation standards. For autonomous driving, accurate velocity (AVE) and attribute prediction (AAE) are critical for downstream planning and policy. Methods such as RoPETR have specifically targeted velocity estimation to improve the NDS, recognizing that velocity error forms a non-trivial fraction of total score reduction. Experimental results demonstrate that modifications aimed at reducing AVE (e.g., spatiotemporal rotary embeddings in RoPETR) yield measurable gains in NDS, with over 0.06 m/s improvement translating to several NDS points in camera-only regimes (Ji et al., 17 Apr 2025).
Impact Table
| Metric Improved | Typical NDS Gain (%) | AVE Reduction (m/s) |
|---|---|---|
| Backbone/Input | ~2 | Minor |
| 3D Point-aware PE | ~1.4 | Small |
| Rotary Embedding (RoPETR) | ~1.4 (test), ~3 (val) | >0.06 |
As observed in (Ji et al., 17 Apr 2025): the majority of recent NDS gains, especially for camera-only approaches, are driven by enhanced temporal modeling and velocity estimation.
6. Practical and Comparative Significance
NDS enables fine-grained, discriminative benchmarks for autonomous driving perception stacks. It reduces noise through operating point pruning and avoids the proliferation of arbitrary hyper-parameters by folding comprehensive state estimation into a single scalar. By explicitly rewarding velocity and attribute accuracy, NDS steers community efforts toward physically realistic object understanding. In competitive contexts, improvements in any submetric (e.g., mAVE via temporal modeling) can pivot relative rankings, prompting advances such as temporal-aware positional encoding in transformer-based detectors (Ji et al., 17 Apr 2025, Caesar et al., 2019).
7. Limitations and Evolving Directions
While NDS delivers a robust, aggregated assessment, it remains contingent on thresholds (e.g., 2m for TP matching) and normalization conventions. The explicit decoupling of error modalities mitigates some pitfalls of pure mAP/IoU metrics, but application-specific requirements (e.g., safety-critical planning) may still require disaggregated error analyses. A plausible implication is that future benchmarks may refine NDS or introduce new terms as automotive perception tasks evolve.
References
- "nuScenes: A multimodal dataset for autonomous driving" (Caesar et al., 2019)
- "RoPETR: Improving Temporal Camera-Only 3D Detection by Integrating Enhanced Rotary Position Embedding" (Ji et al., 17 Apr 2025)