Localisation Recall Precision (LRP) Error
- Localisation Recall Precision (LRP) Error is a comprehensive metric that integrates localisation, recall, and precision to assess detector performance.
- It explicitly decomposes errors into localisation, false positive, and false negative components, offering actionable insights for detector tuning.
- Extensions such as oLRP and aLRP loss enable optimal threshold selection and unified loss design, enhancing visual detection training.
Localisation Recall Precision (LRP) Error quantifies the performance of visual detectors by combining localisation accuracy, recall, and precision into a unified, interpretable metric. LRP was proposed to directly address key deficiencies of Average Precision (AP) in object detection evaluation, such as its insensitivity to bounding box localisation errors and its limited interpretability for identifying sources of error. The metric provides a scalar value in [0, 1], with lower values indicating superior detector performance, and explicitly decomposes the contributions of localisation (bounding box tightness), false positives (precision), and false negatives (recall) (Oksuz et al., 2018). Subsequent generalizations extended LRP evaluation to a broad range of visual detection tasks, including instance segmentation, keypoint detection, and panoptic segmentation (Oksuz et al., 2020).
1. Formal Definition and Components
Let denote the set of ground-truth boxes for one class, and the set of predicted boxes with confidence scores . For a chosen score threshold and Intersection-over-Union (IoU) threshold , predictions with form . Greedy one-to-one matching is performed between and in decreasing score order, accepting only pairs with IoU . The following notations are used:
- 0: Number of true positive matches
- 1: Number of false positives, 2
- 3: Number of false negatives, 4
The compact formulation of LRP error is: 5
LRP can be decomposed into three components:
- Localisation error:
6
Normalized such that 7 is the mean IoU of matched detections.
- False positive error:
8
- False negative error:
9
The weighted sum yields the overall metric: 0 (Oksuz et al., 2018, Oksuz et al., 2020).
2. Computation and Evaluation Procedure
The computation of LRP involves:
- Thresholding: Select an IoU threshold 1 and a confidence threshold 2; assemble 3.
- Matching: Greedy one-to-one matching between 4 and 5 under the IoU and score criteria.
- Count TP/FP/FN: True positives correspond to matches, false positives are unmatched predictions, and false negatives are unmatched ground truths.
- Component calculation: Compute each of the three error terms as specified.
- Aggregate: Combine all error components into the final LRP score.
LRP is undefined only if both 6 and 7 are zero.
3. Optimal LRP (oLRP) and Mean oLRP (moLRP)
The LRP error depends on the chosen confidence threshold 8. To identify a detector's best possible performance, the Optimal LRP (oLRP) is defined as the minimum LRP achievable over all possible score thresholds: 9 The corresponding optimal threshold 0 is the operating point minimizing LRP. For multi-class detectors, the mean oLRP (moLRP) across classes is: 1 This enables evaluation and comparison of detectors at their optimal operational points (Oksuz et al., 2018).
4. Comparison to Average Precision (AP) and Related Metrics
AP is the integral under the precision-recall curve, disregarding the actual tightness of bounding boxes once an IoU threshold is met. AP cannot distinguish between detections with high and low localisation accuracy above 2 and fails to provide operating thresholds or decomposable error sources.
LRP and oLRP directly address these issues:
- LRP decomposes total error into localisation, false positives, and false negatives, providing interpretable diagnostics.
- oLRP identifies per-class optimal thresholds 3, enabling effective threshold selection without manual tuning.
- LRP is a proper metric, satisfying the triangle inequality, and is sensitive to bounding box tightness.
- When AP and moLRP disagree, the latter reveals differences in box tightness and precision-recall curve properties not captured by AP.
LRP also outperforms AP in stability under data scarcity, since LRP does not rely on curve interpolation (Oksuz et al., 2018, Oksuz et al., 2020, Oksuz et al., 2020).
5. Empirical Findings and Practical Implications
Multiple state-of-the-art detectors (e.g., SSD, Faster R-CNN, RetinaNet) evaluated on MS COCO ranked consistently by mAP and moLRP; however, moLRP uniquely quantifies box tightness and the sharpness of the precision-recall profile. Experimental evidence demonstrates:
- Class-specific optimal thresholds 4 vary widely, often deviating from common thresholds like 5, confirming the inadequacy of one-size-fits-all score thresholds.
- Per-class thresholding using 6 obtained from oLRP reduces both LRP error and increases mAP, most significantly for classes with non-standard optimal thresholds.
- Ablation with varying IoU 7 provides insight: higher 8 tightens localisation requirements and affects error trade-offs, summarizing performance under different strictness constraints (Oksuz et al., 2018).
6. Generalizations and Extensions: aLRP Loss
The LRP metric forms the basis for the average Localisation-Recall-Precision (aLRP) loss, a unified, bounded, balanced, ranking-based loss for both classification and localisation in object detection. aLRP extends LRP analogously to how AP Loss extends precision to a ranking-based framework, inheriting the decomposability and task-unification properties.
aLRP uses a single hyperparameter (the smoothing window 9 for approximating the Heaviside step function) and unifies matching, ranking, and localisation error for efficient and balanced gradient updates during training (Oksuz et al., 2020).
In summary, LRP and its variants constitute a rigorous framework for evaluating and diagnosing visual detectors, with superior interpretability and operational utility compared to AP and IoU-only metrics. The explicit decomposition and optimizability of LRP support diagnostic insight, principled threshold selection, and balanced metric-based training loss design.