Ray-level Panoptic Quality (RayPQ)
- Ray-level Panoptic Quality (RayPQ) is a segmentation metric that evaluates semantic and instance predictions along sensor rays in 3D occupancy scenarios.
- It adapts classical panoptic quality methods by using sensor-ray data, enabling precise evaluation for autonomous perception benchmarks like Occ3D-nuScenes.
- RayPQ integrates RayIoU-based matching while penalizing false positives and negatives, ensuring robust segmentation accuracy even under occlusions.
Ray-level Panoptic Quality (RayPQ) is a segmentation metric designed for evaluating the joint quality of semantic and instance-level predictions in 3D occupancy settings, where the atomic elements are sensor rays rather than image pixels or 3D voxels. RayPQ retains the structure and spirit of the classical Panoptic Quality (PQ) metric but adapts both the matching and counting logic to operate along sensor-observed rays, which is central to panoptic-occupancy benchmarks in autonomous perception, notably Occ3D-nuScenes. RayPQ is now established as a primary metric for panoptic occupancy, directly aligning with the outputs of multi-view sensor-based methods and addressing the evaluation requirements of modern 3D scene understanding pipelines (Yu et al., 2024, Caunes et al., 6 Mar 2026).
1. Formal Definition and Mathematical Characterization
Let be the set of all sensor (camera or lidar) rays, with one ray per pixel per camera or per voxel column in a regular 3D grid. For each ray , both the prediction and the ground truth specify the first 'visible' occupied voxel encountered, along with its semantic label and instance ID . For ground truth, denote and , respectively. For every predicted instance and ground-truth instance , define:
- Predicted ray segment:
- Ground-truth ray segment:
The RayIoU between predicted instance and ground truth is:
A predicted instance and ground-truth are matched as a true positive (TP) if , usually with . Sets of true positives (TP), false positives (FP, predicted but not matched), and false negatives (FN, ground truth not matched) are:
- TP: all matched pairs
- FP: all not matched to any
- FN: all not matched to any
RayPQ is then
The output is a scalar in , typically reported as a percentage (Yu et al., 2024).
2. Computational Workflow in Panoptic Occupancy
RayPQ computation proceeds as follows:
- Rays are cast from each image pixel (camera-based) or voxel column (lidar-based), according to the sensor model and poses.
- For each ray, the first encountered occupied voxel (the "visible surface" or "first-hit") is determined in both prediction and ground truth.
- Record the predicted and ground truth for each ray .
- Group rays by predicted and ground-truth instances to define ray segments.
- For all pairs , compute .
- Match pairs with as TPs; unmatched are FP, unmatched are FN.
- Aggregate results via the RayPQ formula above (Yu et al., 2024, Caunes et al., 6 Mar 2026).
This produces a global RayPQ score, as well as per-range RayPQ variants, depending on the spatial partitioning of rays.
3. Relationship to Related Metrics: PQ and RayIoU
Classical Panoptic Quality (PQ) measures, as established by Kirillov et al. (2019), evaluate panoptic segmentation by matching connected components (segments) across pixels in 2D or voxels in 3D, using IoU as the region overlap criterion. RayIoU, as introduced in Occ3D, is the semantic analog defined over sensor rays, reporting per-class occupancy accuracy via intersection/union counts for sets of rays assigned to each semantic class (Yu et al., 2024, Caunes et al., 6 Mar 2026).
Key distinctions:
- PQ (pixel/voxel-level): Atomic elements are pixels or voxels; segments are connected components; matches are made via pixel-/voxel-IoU.
- RayIoU (ray-level, semantic): Ignores instance IDs; compares semantic occupancy labels per ray.
- RayPQ (ray-level, panoptic): Atomic elements are rays; segments are ray-segments defined by instance ID and semantic class along the ray's first hit; matches combine instance grouping and per-ray assignment.
Advantages of RayPQ over RayIoU and PQ:
- Simultaneously evaluates segmentation accuracy and instance grouping.
- Penalizes missing (FN) or spurious (FP) instances via denominator.
- Aligned with sensor-ray visibility, handling occlusion natively and avoiding the need to densify to 3D masks (Yu et al., 2024).
4. Practical Implementation and Interpretative Properties
RayPQ aligns evaluation with the data modality actually sensed at test time, i.e., only visible surfaces as observed via sensor projections. Each ray is evaluated equally, preventing volume bias due to surface thickness or occluded geometry. The focus on first-hit assignment avoids the ambiguity of dense 3D annotation and is computationally scalable, facilitating differentiation in 'soft' evaluations and direct aggregation in post-processing (Yu et al., 2024, Caunes et al., 6 Mar 2026).
Practical evaluation highlights:
- Reported RayPQ values on Occ3D-nuScenes are: Panoptic-FlashOcc, 16.0 (fully supervised); FreeOcc, 3.1 (training-free), with gains up to 3.9 under weak supervision (Caunes et al., 6 Mar 2026).
- Instance identification significantly impacts RayPQ (>1 point improvement in ablations), while semantic, geometric, and occupancy refinements also contribute incrementally.
- The metric is sensitive to accurate camera extrinsics: incorrect poses cause substantial RayPQ collapse (2.5 → 1.1 in ablations).
- Use of non-causal/future frames can raise RayPQ by up to 40% (2.5 → 3.5) (Caunes et al., 6 Mar 2026).
Ablation results demonstrate the cumulative effect of system components, with the largest RayPQ jumps achieved through explicit instance-awareness.
5. Comparative Evaluation and Suitability for Panoptic Occupancy
RayPQ is the chosen main metric for panoptic occupancy tasks on Occ3D-nuScenes, precisely because it embodies both panoptic instance grouping and sensor-visibility constraints. Unlike PQ applied directly on voxels—which is not naturally consistent with visibility from sensor viewpoints—or RayIoU, which is purely semantic, RayPQ fully implements the intended evaluation of instance-aware, content-aligned predictions as realized on rays (Yu et al., 2024, Caunes et al., 6 Mar 2026).
In experimental reporting:
- Panoptic-FlashOcc achieves 16.0 RayPQ at 30.2 FPS, surpassing prior methods on both accuracy and efficiency (Yu et al., 2024).
- FreeOcc sets 3.1 RayPQ as the baseline for training-free panoptic occupancy, with substantial performance improvements at longer ranges (RayPQ_4m = 5.8) confirming the metric’s geometric fidelity.
RayPQ correlates with perceptual sharpness and functional task performance, such as driver-centric safety (e.g., obstacle avoidance), establishing its practical relevance for autonomous systems.
6. Summary Table: Key Metrics and Their Evaluated Properties
| Metric | Instance Awareness | Atomic Element | Reporting Base | Occlusion Handling | Penalizes FP/FN |
|---|---|---|---|---|---|
| PQ | Yes | Pixels/Voxels | 2D/3D grid | No | Yes |
| RayIoU | No | Rays | Sensor rays | Yes | No (semantic) |
| RayPQ | Yes | Rays | Sensor rays | Yes | Yes |
RayPQ generalizes PQ to the 3D occupancy setting, using rays instead of grid elements, and is strictly more informative than RayIoU for panoptic tasks.
7. References
- Panoptic-FlashOcc: Y. Zichen et al., "Panoptic-FlashOcc: An Efficient Baseline to Marry Semantic Occupancy with Panoptic via Instance Center," Occ3D-nuScenes Benchmark (Yu et al., 2024).
- FreeOcc: "FreeOcc: Training-free Panoptic Occupancy Prediction via Foundation Models" (Caunes et al., 6 Mar 2026).
- Kirillov, A. et al., "Panoptic segmentation," CVPR 2019 (classical PQ metric).
- Occ3D Benchmark: Tian, X. et al., "Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving," arXiv 2023 (RayIoU context).