Papers
Topics
Authors
Recent
Search
2000 character limit reached

MAP@100: Ranking Quality Metric

Updated 9 March 2026
  • MAP@100 is a metric that quantifies ranking quality by calculating average precision over only the top 100 results, integrating both relevance and rank order.
  • It is applied in scenarios such as query evaluation, large-scale image retrieval, and deep learning pipelines by leveraging differentiable surrogates like smooth histogram-binning.
  • Analytic baselines and statistical significance tests using MAP@100 provide a rigorous framework for benchmarking and validating improvements in ranking algorithms.

@@@@1@@@@ at 100 (MAP@100) is a widely used metric in information retrieval and recommender systems, designed to evaluate the quality of ranking algorithms when only the top 100 results are of interest. MAP@100 integrates both the relevance and ranking positions of retrieved items, and its application spans query-based evaluation, large-scale image retrieval, and statistical significance testing of ranking improvements (Manzhos et al., 4 Nov 2025, Revaud et al., 2019).

1. Formal Definitions and Notational Framework

Let NN denote the total number of candidate items and RR the subset of relevant items (RNR\leq N). For a retrieved list truncated at position k=100k=100:

  • rel(i){0,1}\mathrm{rel}(i)\in\{0,1\} indicates relevance of item at position ii.
  • P@i=1ij=1irel(j)\mathrm{P}@i = \frac{1}{i} \sum_{j=1}^i \mathrm{rel}(j) is the precision at position ii.
  • M=min(R,100)M=\min(R,100) is the normalization denominator.
  • The kkth harmonic number is Hk=i=1k1/iH_k = \sum_{i=1}^k 1/i; Hk(2)=i=1k1/i2H_k^{(2)} = \sum_{i=1}^k 1/i^2.

For a single ranking, the Average Precision at 100 is

AP@100=1Mi=1100P@irel(i).AP@100 = \frac{1}{M} \sum_{i=1}^{100} \mathrm{P}@i \cdot \mathrm{rel}(i) .

For a set UU of QQ queries or users, each with its own AP@100uAP@100_u,

MAP@100=1Qu=1QAP@100u.MAP@100 = \frac{1}{Q}\sum_{u=1}^Q AP@100_u.

This definition ensures that MAP@100 is sensitive to both the number and distribution of relevant items within the top 100 ranks (Manzhos et al., 4 Nov 2025, Revaud et al., 2019).

2. Random Baseline: Expectation and Variance under Uniform Rankings

MAP@100’s significance is enhanced by analytic baselines under the random-ranking model, where RR relevant items are uniformly distributed among NN candidates (sampling without replacement).

The expectation of AP@100AP@100 is

E[AP@100]=RNM(R1N1100+NRN1H100),E[AP@100] = \frac{R}{N\,M} \left(\frac{R-1}{N-1}\cdot 100 + \frac{N-R}{N-1} H_{100}\right),

where M=min(R,100)M = \min(R,100) (Manzhos et al., 4 Nov 2025). This establishes the expected MAP@100 achievable by chance.

The variance has the form

Var(AP@100)=1M2RN[100(C+2(EF)+99G)+H100(B2(E100F))+H100(2)(AD)+H1002D],\mathrm{Var}(AP@100) = \frac{1}{M^2} \frac{R}{N} \Big[100(C + 2(E-F) + 99 G) + H_{100}(B-2(E - 100 F)) + H^{(2)}_{100}(A-D) + H_{100}^2 D \Big],

with explicit expressions for A,B,C,D,E,F,GA, B, C, D, E, F, G as functions of NN and RR. For QQ independent queries,

E[MAP@100]=1Qu=1QE[AP@100u],Var(MAP@100)=1Q2u=1QVar(AP@100u),E[MAP@100] = \frac{1}{Q} \sum_{u=1}^Q E[AP@100_u],\qquad \mathrm{Var}(MAP@100) = \frac{1}{Q^2} \sum_{u=1}^Q \mathrm{Var}(AP@100_u),

and for homogeneous queries (Nu=N,Ru=R)(N_u = N, R_u = R),

E[MAP@100]=E[AP@100],Var(MAP@100)=1QVar(AP@100).E[MAP@100] = E[AP@100],\quad \mathrm{Var}(MAP@100) = \frac{1}{Q} \mathrm{Var}(AP@100).

These baselines are fundamental for statistical testing and for contextualizing observed system performance (Manzhos et al., 4 Nov 2025).

3. MAP@100 for Batched Image Retrieval and Listwise Optimization

In deep image retrieval systems, MAP@100 is computed as follows. Let diRCd_i \in \mathbb{R}^C be normalized descriptors and Siq=dqdiS_{i}^q = d_q^\top d_i the cosine similarity. The database {Ii}i=1N\{I_i\}_{i=1}^N is searched for relevant items corresponding to query qq. The definitions proceed as:

  • Yiq{0,1}Y_i^q\in\{0,1\}: relevance label for query qq, database item ii.
  • Nq=i=1NYiqN^q = \sum_{i=1}^{N} Y_i^q: total relevant images.

Truncated average precision at K=100K=100,

AP@100(Sq,Yq)=k=1100Pk(Sq,Yq)Δrk(Sq,Yq),AP@100(S^q, Y^q) = \sum_{k=1}^{100} P_k(S^q, Y^q) \cdot \Delta r_k(S^q, Y^q),

with PkP_k and Δrk\Delta r_k as the precision and recall increments at position kk. Over QQ queries,

MAP@100=1Qq=1QAP@100(Sq,Yq).MAP@100 = \frac{1}{Q} \sum_{q=1}^Q AP@100(S^q, Y^q).

This formulation aligns with practical retrieval and learning pipelines (Revaud et al., 2019).

4. Differentiable Surrogates: Smooth Histogram-Binning for AP@100

Classic AP@100 calculation is non-differentiable due to explicit sorting and indicator functions. A smooth surrogate is constructed by soft-binning the score axis [1,1][-1,1] into MM bins of width Δ=2/(M1)\Delta=2/(M-1), each centered at bm=1(m1)Δb_m = 1 - (m-1)\Delta. The kernel

δ(x,m)=max(1xbm/Δ,0)\delta(x, m) = \max(1 - |x-b_m|/\Delta, 0)

provides a soft assignment, and soft histograms for all and relevant items are accumulated:

  • Cq,mall=i=1Nδ(Siq,m)C_{q, m}^{all} = \sum_{i=1}^N \delta(S_i^q, m),
  • Cq,mrel=i=1Nδ(Siq,m)YiqC_{q, m}^{rel} = \sum_{i=1}^N \delta(S_i^q, m) Y_i^q.

Approximated precision and recall per bin are

P^q,m=m=1mCq,mrelm=1mCq,mall,Δr^q,m=Cq,mrelNq,\hat P_{q,m} = \frac{\sum_{m'=1}^m C_{q,m'}^{rel}}{\sum_{m'=1}^m C_{q,m'}^{all}}, \qquad \Delta \hat r_{q,m} = \frac{C_{q,m}^{rel}}{N^q},

yielding quantized AP,

APQ(Sq,Yq)=m=1MP^q,mΔr^q,m.AP_Q(S^q, Y^q) = \sum_{m=1}^M \hat P_{q,m} \cdot \Delta \hat r_{q,m}.

This AP surrogate is differentiable w.r.t. network parameters, enabling direct end-to-end optimization (Revaud et al., 2019).

5. Computation, Training, and Memory Considerations

Sorting-based AP@100AP@100 requires O(NlogN)O(N \log N) per query. The histogram-binning approximation bypasses explicit sorting with computational cost O(NM)O(NM) per query. For batched training, N=BN=B (batch size), yielding O(B2M)O(B^2 M) operations per batch. Memory usage is dominated by the B×BB \times B similarity matrix and BCB \cdot C descriptors. For example, with B=4096B=4096 and descriptor dimension C=2048C=2048, total memory footprint is O(B2+BC)O(B^2 + B C)—well within typical GPU memory budgets (Revaud et al., 2019).

Training with this surrogate involves:

  • Forward-passing all BB images to obtain descriptors.
  • Computing the similarity matrix, AP surrogates, and loss gradients wrt descriptors (O(B2M)O(B^2 M) memory/compute).
  • Backpropagating gradients by recomputing each image’s forward pass individually, eliminating the need to store all activations. This staged procedure optimally utilizes GPU resources and provides 2–3×\times speed-ups over alternative approaches such as hard-negative mining (Revaud et al., 2019).

6. Statistical Significance and Null Model Interpretations

MAP@100 values are conventionally interpreted relative to random-ranking baselines. Compute the mean (μ0=E[MAP@100]\mu_0 = E[MAP@100]) and standard deviation (σ0=Var(MAP@100)\sigma_0 = \sqrt{\mathrm{Var}(MAP@100)}) for the random model. Given an observed MAP@100obsMAP@100_\mathrm{obs}, the standardized zz-score is:

z=MAP@100obsμ0σ0.z = \frac{MAP@100_\mathrm{obs} - \mu_0}{\sigma_0} .

Under the null (random) hypothesis, zz is approximately standard normal. z>1.96z>1.96 implies statistical significance at p<0.05p<0.05. This framework enables researchers to rigorously assess if observed ranking gains exceed those explainable by chance, with analytic baselines for mean and fluctuation scale (Manzhos et al., 4 Nov 2025).

7. Practical Relevance and Context Among Metrics

MAP@100 is preferred in scenarios where only the top-ranked results are critical, such as web search, recommender system outputs, and image retrieval tasks. Compared to untruncated MAP, MAP@100 more closely models user-facing scenarios where lower-ranked results are rarely examined. The differentiable surrogates developed for deep learning pipelines facilitate direct optimization of retrieval objectives, outperforming proxy loss functions or heuristic approaches (Revaud et al., 2019). The closed-form random baselines further enhance MAP@100’s interpretability and robustness for benchmarking systems (Manzhos et al., 4 Nov 2025).


References:

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Mean Average Precision at 100 (MAP@100).