LocScore: Localization-Aware Retrieval Metric
- LocScore is a localization-aware metric that evaluates retrieval performance by combining ranked retrieval and spatial overlap (IoU) for patch-wise image queries.
- It addresses limitations of traditional metrics by jointly considering accuracy in retrieval order and spatial alignment, providing nuanced insights into system performance.
- Empirical studies show LocScore effectively differentiates between global and patch-wise methods, emphasizing precise object localization in image retrieval tasks.
LocScore is a localization-aware metric introduced for spatially grounded instance-level image retrieval systems, specifically designed for patch-wise retrieval scenarios where it is essential to assess both retrieval accuracy and the spatial correctness of the matched region. LocScore jointly considers the ranked retrieval position of a true positive and the spatial overlap between the predicted patch and the ground-truth bounding box, thereby providing a diagnostic tool for evaluating the ability of retrieval systems not merely to locate positive images but to accurately localize the target object within them (Choi et al., 14 Dec 2025).
1. Motivation and Conceptual Foundation
LocScore was developed to address the limitations of traditional retrieval metrics such as average precision (AP) and intersection-over-union (IoU) when applied independently. In patch-wise retrieval, AP quantifies the rank at which ground-truth positives appear but does not measure spatial alignment, while IoU captures the overlap between the predicted and ground-truth regions but is agnostic to retrieval ordering. LocScore integrates these two aspects, weighting each true-positive retrieval by both its precision (proportion of positives retrieved up to that rank) and its IoU with the ground-truth box. High LocScore values indicate both early retrieval and accurate spatial alignment, while degradations in either ranking or localization are penalized.
2. Mathematical Definition
LocScore is rigorously defined as follows. For query :
- : Number of ground-truth positives for query .
- : Ground-truth bounding box for the -th positive.
- : Predicted bounding box for the highest-scoring patch of the -th positive.
- : Rank of the retrieved positive image.
- : Number of ground-truth positives among the top retrieved slots.
The spatial overlap is quantified as:
Per-query continuous LocScore:
Dataset-wide LocScore:
Thresholded LocScore at IoU :
Mean-thresholded LocScore () over thresholds :
3. Computation Workflow
For each query:
- Retrieve a ranked list of candidate images, each associated with a single top-scoring patch and bounding box.
- Identify all ground-truth positive images.
- For each positive image:
- Record its retrieval rank .
- Compute spatial overlap .
- Compute the precision term , where counts the number of positives retrieved up to rank .
- Aggregate using the formula for ; average across all queries for .
- For thresholded variants, replace the IoU factor by an indicator of IoU exceeding .
Boxes are represented by , and IoU is computed in . Only the single highest-scoring patch per image is required. Vectorized implementation can accelerate computation by sorting ranks and applying cumulative sums.
4. Properties, Interpretative Range, and Metric Behavior
- LocScore ranges in due to and .
- The sum corresponds exactly to AP for query ; thus, since always.
- Patches with partial localization (e.g. ) contribute proportionally less.
- Sensitivity to misalignment is immediate: small spatial shifts that lower IoU reduce LocScore, even at optimal retrieval rank.
- For thresholded , is non-increasing in by construction.
5. Empirical Performance and Comparative Assessment
LocScore enables finer discrimination between retrieval systems. Empirical results on the DINOv2 backbone demonstrate that global methods reach mAP but LocScore , whereas patch-wise local DINOv2 achieves LocScore . This highlights that global feature matching often correlates with background co-occurrence rather than actual object alignment. Case studies with perfect AP but differing LocScores further emphasize that AP alone cannot capture spatial correctness. Thresholded LocScore analyses show that sliding-window methods yield superior localization accuracy under stricter IoU compared to coarse grid patch selection (Choi et al., 14 Dec 2025).
6. Efficient Implementation Techniques
- Compute IoU only for retrieved ground-truth positive images.
- Pre-sort retrieved lists, mask positives, and apply cumulative summation for calculation.
- For mLocScore across multiple thresholds, compute relevant binary masks in one batch and produce the aggregate.
- For large-scale evaluation, only compute one patch per retrieved image, not all potential alignments.
7. Worked Example and Practical Interpretation
Consider a single query with positives:
- Rank 1: Image A (ground-truth),
- Rank 2: Image X (false positive)
- Rank 3: Image B (ground-truth),
For Image A: , , contribution . For Image B: , , contribution .
Continuous LocScore:
Thresholded LocScore at : only Image A passes, so
This procedure operationalizes LocScore as both a benchmark and a diagnostic, making it widely applicable for spatially sensitive retrieval system development.
LocScore represents a metricically rigorous and empirically validated approach for quantifying both retrieval rank and localization fidelity within patch-wise retrieval frameworks, and offers interpretable, spatially structured insight beyond what is possible using standard precision metrics alone (Choi et al., 14 Dec 2025).