Papers
Topics
Authors
Recent
2000 character limit reached

LocScore: Localization-Aware Retrieval Metric

Updated 16 December 2025
  • LocScore is a localization-aware metric that evaluates retrieval performance by combining ranked retrieval and spatial overlap (IoU) for patch-wise image queries.
  • It addresses limitations of traditional metrics by jointly considering accuracy in retrieval order and spatial alignment, providing nuanced insights into system performance.
  • Empirical studies show LocScore effectively differentiates between global and patch-wise methods, emphasizing precise object localization in image retrieval tasks.

LocScore is a localization-aware metric introduced for spatially grounded instance-level image retrieval systems, specifically designed for patch-wise retrieval scenarios where it is essential to assess both retrieval accuracy and the spatial correctness of the matched region. LocScore jointly considers the ranked retrieval position of a true positive and the spatial overlap between the predicted patch and the ground-truth bounding box, thereby providing a diagnostic tool for evaluating the ability of retrieval systems not merely to locate positive images but to accurately localize the target object within them (Choi et al., 14 Dec 2025).

1. Motivation and Conceptual Foundation

LocScore was developed to address the limitations of traditional retrieval metrics such as average precision (AP) and intersection-over-union (IoU) when applied independently. In patch-wise retrieval, AP quantifies the rank at which ground-truth positives appear but does not measure spatial alignment, while IoU captures the overlap between the predicted and ground-truth regions but is agnostic to retrieval ordering. LocScore integrates these two aspects, weighting each true-positive retrieval by both its precision (proportion of positives retrieved up to that rank) and its IoU with the ground-truth box. High LocScore values indicate both early retrieval and accurate spatial alignment, while degradations in either ranking or localization are penalized.

2. Mathematical Definition

LocScore is rigorously defined as follows. For query nn:

  • InI_n: Number of ground-truth positives for query nn.
  • Bgtn,iR4B^{n,i}_{\mathrm{gt}}\in\mathbb{R}^4: Ground-truth bounding box for the ii-th positive.
  • Bpredn,iR4B^{n,i}_{\mathrm{pred}}\in\mathbb{R}^4: Predicted bounding box for the highest-scoring patch of the ii-th positive.
  • rn,iNr^{n,i}\in\mathbb{N}: Rank of the retrieved positive image.
  • hn,ih^{n,i}: Number of ground-truth positives among the top rn,ir^{n,i} retrieved slots.

The spatial overlap is quantified as: IoU(Bgtn,i,Bpredn,i)=area(Bgtn,iBpredn,i)area(Bgtn,iBpredn,i)\mathrm{IoU}(B^{n,i}_{\mathrm{gt}}, B^{n,i}_{\mathrm{pred}}) = \frac{\mathrm{area}(B^{n,i}_{\mathrm{gt}} \cap B^{n,i}_{\mathrm{pred}})}{\mathrm{area}(B^{n,i}_{\mathrm{gt}} \cup B^{n,i}_{\mathrm{pred}})}

Per-query continuous LocScore:

LocScore(n)=1Ini=1In(hn,irn,i×IoU(Bgtn,i,Bpredn,i))\mathrm{LocScore}^{(n)} = \frac{1}{I_n} \sum_{i=1}^{I_n} \left( \frac{h^{n,i}}{r^{n,i}} \times \mathrm{IoU}(B^{n,i}_{\mathrm{gt}}, B^{n,i}_{\mathrm{pred}}) \right)

Dataset-wide LocScore:

LocScore=1Nn=1NLocScore(n)\mathrm{LocScore} = \frac{1}{N} \sum_{n=1}^{N} \mathrm{LocScore}^{(n)}

Thresholded LocScore at IoU δ\geq\delta:

LocScore(n)(δ)=1Ini=1Inhn,irn,i×I[IoU(Bgtn,i,Bpredn,i)δ]\mathrm{LocScore}^{(n)}(\delta) = \frac{1}{I_n} \sum_{i=1}^{I_n} \frac{h^{n,i}}{r^{n,i}} \times \mathbb{I}\left[ \mathrm{IoU}(B^{n,i}_{\mathrm{gt}}, B^{n,i}_{\mathrm{pred}}) \geq \delta \right]

Mean-thresholded LocScore (mLocScore\mathrm{mLocScore}) over thresholds T={0.3,0.4,0.5}\mathcal{T}=\{0.3,0.4,0.5\}:

mLocScore=1TδTLocScore(δ)\mathrm{mLocScore} = \frac{1}{|\mathcal{T}|} \sum_{\delta\in\mathcal{T}} \mathrm{LocScore}(\delta)

3. Computation Workflow

For each query:

  1. Retrieve a ranked list of candidate images, each associated with a single top-scoring patch and bounding box.
  2. Identify all ground-truth positive images.
  3. For each positive image:
    • Record its retrieval rank rn,ir^{n,i}.
    • Compute spatial overlap IoU(Bgtn,i,Bpredn,i)\mathrm{IoU}(B^{n,i}_{\mathrm{gt}}, B^{n,i}_{\mathrm{pred}}).
    • Compute the precision term hn,i/rn,ih^{n,i} / r^{n,i}, where hn,ih^{n,i} counts the number of positives retrieved up to rank rn,ir^{n,i}.
  4. Aggregate using the formula for LocScore(n)\mathrm{LocScore}^{(n)}; average across all queries for LocScore\mathrm{LocScore}.
  5. For thresholded variants, replace the IoU factor by an indicator of IoU exceeding δ\delta.

Boxes are represented by (xmin,ymin,xmax,ymax)(x_{\min}, y_{\min}, x_{\max}, y_{\max}), and IoU is computed in O(1)O(1). Only the single highest-scoring patch per image is required. Vectorized implementation can accelerate hn,ih^{n,i} computation by sorting ranks and applying cumulative sums.

4. Properties, Interpretative Range, and Metric Behavior

  • LocScore ranges in [0,1][0,1] due to hn,i/rn,i1h^{n,i}/r^{n,i}\leq1 and IoU1\mathrm{IoU}\leq1.
  • The sum ihn,i/rn,i\sum_i h^{n,i}/r^{n,i} corresponds exactly to AP for query nn; thus, LocScoreAP\mathrm{LocScore}\leq\mathrm{AP} since IoU1\mathrm{IoU}\leq1 always.
  • Patches with partial localization (e.g. IoU=0.5\mathrm{IoU}=0.5) contribute proportionally less.
  • Sensitivity to misalignment is immediate: small spatial shifts that lower IoU reduce LocScore, even at optimal retrieval rank.
  • For thresholded δ\delta, LocScore(δ)\mathrm{LocScore}(\delta) is non-increasing in δ\delta by construction.

5. Empirical Performance and Comparative Assessment

LocScore enables finer discrimination between retrieval systems. Empirical results on the DINOv2 backbone demonstrate that global methods reach mAP 57.7%\approx57.7\% but LocScore 15.1%\approx15.1\%, whereas patch-wise local DINOv2 achieves LocScore 22.2%\approx22.2\%. This highlights that global feature matching often correlates with background co-occurrence rather than actual object alignment. Case studies with perfect AP=1.0=1.0 but differing LocScores further emphasize that AP alone cannot capture spatial correctness. Thresholded LocScore analyses show that sliding-window methods yield superior localization accuracy under stricter IoU compared to coarse grid patch selection (Choi et al., 14 Dec 2025).

6. Efficient Implementation Techniques

  • Compute IoU only for retrieved ground-truth positive images.
  • Pre-sort retrieved lists, mask positives, and apply cumulative summation for hn,ih^{n,i} calculation.
  • For mLocScore across multiple thresholds, compute relevant binary masks in one batch and produce the aggregate.
  • For large-scale evaluation, only compute one patch per retrieved image, not all potential alignments.

7. Worked Example and Practical Interpretation

Consider a single query with In=2I_n=2 positives:

  • Rank 1: Image A (ground-truth), IoU(A)=1.0\mathrm{IoU}(A) = 1.0
  • Rank 2: Image X (false positive)
  • Rank 3: Image B (ground-truth), IoU(B)=0.5\mathrm{IoU}(B) = 0.5

For Image A: rn,1=1r^{n,1}=1, hn,1=1h^{n,1}=1, contribution =1.0=1.0. For Image B: rn,2=3r^{n,2}=3, hn,2=2h^{n,2}=2, contribution =(2/3)×0.50.333=(2/3)\times0.5\approx0.333.

Continuous LocScore:

LocScore(n)=12(1.0+0.333)=0.6667\mathrm{LocScore}^{(n)} = \tfrac{1}{2}(1.0 + 0.333) = 0.6667

Thresholded LocScore at δ=0.6\delta=0.6: only Image A passes, so

LocScore(n)(0.6)=12(1+0)=0.5\mathrm{LocScore}^{(n)}(0.6) = \tfrac{1}{2}(1 + 0) = 0.5

This procedure operationalizes LocScore as both a benchmark and a diagnostic, making it widely applicable for spatially sensitive retrieval system development.


LocScore represents a metricically rigorous and empirically validated approach for quantifying both retrieval rank and localization fidelity within patch-wise retrieval frameworks, and offers interpretable, spatially structured insight beyond what is possible using standard precision metrics alone (Choi et al., 14 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to LocScore.