Papers
Topics
Authors
Recent
Search
2000 character limit reached

Mean Average Precision (MAP@100)

Updated 17 February 2026
  • MAP@100 is a metric that evaluates ranked retrieval quality by averaging precision scores for the top 100 results, considering both relevance and item position.
  • It involves ranking items by similarity, computing precision at each rank, and normalizing by the smaller of the number of relevant items or 100.
  • MAP@100 is used as both an evaluation benchmark and a learning objective in deep metric and hashing systems, improving performance in large-scale applications.

Mean Average Precision at 100 (MAP@100) is a statistical evaluation metric widely adopted in information retrieval, recommender systems, image retrieval, and descriptor learning to assess the quality of ranked retrieval for a set of queries when only the top 100 results are of primary interest. By accounting for both the relevance of items and their positions within the truncated ranking, MAP@100 provides a nuanced measure of retrieval effectiveness, especially for applications—such as visual search or local descriptor matching—where operational cutoffs are imposed. MAP@100 is ubiquitous as both a benchmarking tool and a learning objective in large-scale retrieval, hashing, and end-to-end deep metric learning frameworks (Ding et al., 2018, Manzhos et al., 4 Nov 2025, He et al., 2018, Revaud et al., 2019).

1. Formal Definition and Mathematical Formulation

For a query qq, a retrieval system produces an ordered list of items, among which NqN_{q} are defined as relevant (ground truth). Let KK denote the cutoff rank: in MAP@100, K=100K=100. Let relq(k)\operatorname{rel}_q(k) be an indicator function ($1$ if the kk-th ranked item is relevant, $0$ otherwise), and Pq(k)=1ki=1krelq(i)P_q(k)=\frac{1}{k} \sum_{i=1}^k \operatorname{rel}_q(i) denote the precision at cutoff kk. The Average Precision at KK for a query qq is

AP(q,K)=1min(Nq,K)k=1KPq(k)relq(k)AP(q, K) = \frac{1}{\min(N_q, K)} \sum_{k=1}^{K} P_q(k) \cdot \operatorname{rel}_q(k)

The Mean Average Precision at KK (MAP@K), and specifically MAP@100, is then given by averaging AP(q,100)AP(q, 100) over a set of queries QQ:

MAP@100=1QqQAP(q,100)MAP@100 = \frac{1}{|Q|} \sum_{q \in Q} AP(q, 100)

This metric ensures normalization with respect to the smaller of the number of relevant items or the cutoff, yielding a value in the interval [0,1][0,1] for every query (Ding et al., 2018, Manzhos et al., 4 Nov 2025, He et al., 2018, Revaud et al., 2019).

2. Calculation Procedure and Stepwise Example

Evaluating MAP@100 involves, for each query:

  1. Ranking all database items by similarity to the query and selecting the top 100.
  2. For each position k[1,100]k \in [1,100]:
    • Compute precision at kk.
    • If the item is relevant (relq(k)=1\operatorname{rel}_q(k)=1), accumulate Pq(k)P_q(k).
  3. Normalize the accumulated sum by min(Nq,100)\min(N_q,100).
  4. Average across all queries.

For instance, if a query qq has 3 relevant items in the database, and the top 5 retrieved have relevance pattern [1, 0, 1, 0, 1]:

kk rel_q(k) P_q(k) Contribution
1 1 1/1 = 1.0 1.0
2 0 1/2 = 0.5 0
3 1 2/3 ≈ 0.667 0.667
4 0 2/4 = 0.5 0
5 1 3/5 = 0.6 0.6

Sum: $1.0 + 0.667 + 0.6 = 2.267$, yielding AP(q,5)=2.267/30.756AP(q,5) = 2.267/3 ≈ 0.756. Averaging such APs across queries yields MAP@K (Ding et al., 2018).

3. Statistical Properties and Random Ranking Baselines

Under random ranking, the expected value and variance of AP@100AP@100 serve as essential performance baselines. Two principal evaluation models are established:

  • Offline (without replacement): Given a corpus size NN, with mm relevant items, the expectation and variance are given by

    EWOR[AP@100]=mNM(m1N1100+NmN1H100)\mathbb{E}_{WOR}[AP@100] = \frac{m}{N M}\left( \frac{m-1}{N-1} \cdot 100 + \frac{N-m}{N-1} H_{100} \right)

    where M=min(m,100)M = \min(m,100) and H100=i=11001i5.1874H_{100} = \sum_{i=1}^{100} \frac{1}{i} \approx 5.1874.

  • Online (with replacement/Bernoulli model): For label probability pp,

    EWR[AP@100]=p(p+(1p)H100100)\mathbb{E}_{WR}[AP@100] = p \left( p + (1-p) \frac{H_{100}}{100} \right)

Variances are also derived and enable construction of confidence intervals on observed MAP@100 for statistical significance evaluation.

For example: In a 1000-item dataset, m=50m=50 relevant, offline, EWOR[AP@100]0.00984E_{WOR}[AP@100] \approx 0.00984; in the online model, p=0.05p=0.05, EWR[AP@100]0.00497E_{WR}[AP@100] \approx 0.00497 (Manzhos et al., 4 Nov 2025).

4. Role as Learning Objective and Differentiable Approximations

MAP@100 serves not only as an evaluation metric but also as a direct optimization objective in neural descriptor learning, metric learning, and hashing. The key challenge is the non-differentiability of AP due to ranking operations. Recent approaches implement differentiable approximations to AP@K using histogram binning and smoothing kernels.

Letting SqS^q be the similarities between query qq and database items and YqY^q the relevance vector, AP@K can be approximated via "soft" bin counts over MM bins (triangular kernels), yielding a differentiable AP^\widehat{AP}:

AP^(Sq,Yq)=m=1MH+(m)H(m)h+(m)Nq\widehat{AP}(S^q,Y^q) = \sum_{m=1}^{M} \frac{H^+(m)}{H(m)} \cdot \frac{h^+(m)}{N^q}

Gradients with respect to the scores can be computed by backpropagating through these aggregates, enabling end-to-end neural network training directly for (approximated) MAP@100 (He et al., 2018, Revaud et al., 2019). This approach eliminates the need for auxiliary ranking surrogates like triplet or pairwise losses and improves retrieval performance especially at operationally relevant cutoffs.

5. Practical Considerations and Limitations

MAP@100 is favored in large-scale retrieval—such as visual search or recommender settings—where only a finite prefix of the ranking is actionable. Several practical issues arise:

  • Efficient Top-K Retrieval: In massive databases, exhaustive similarity computation is infeasible. Sublinear retrieval approaches (e.g., local-sensitive hashing or Hamming radius expansion) are cooptimized with MAP@100 evaluation, but may introduce tie-breaking ambiguities, especially in hashing-based systems (Ding et al., 2018).
  • Tie-breaking and Hash Collisions: In regimes with numerous hash collisions (e.g., identical Hamming codes shared across many items), AP@K and hence MAP@100 can take non-unique values depending on how ties are resolved. Pushing for perfect mAP in such contexts can incentivize degenerate codebook utilization, where all relevant items are mapped to a single code, reducing retrieval diversity (Ding et al., 2018).
  • Cutoff Effects: The normalization of AP by min(Nq,K)\min(N_q,K) makes scores comparable across queries, but can underrepresent the influence of queries with many more than 100 relevant items, and the AP may become insensitive to quality beyond the cutoff.

6. Extensions and Alternative Metrics

MAP@100, while effective for positionally sensitive ranking assessment, does not penalize codebook collapse or quantify code utilization in hashing. The Mean Local Group Average Precision (mLGAP) augments AP with a dispersion penalty ϕ(S)\phi(S) penalizing collisions and rewarding uniform code use:

LGAP(q;r)=1r+1k=0r[P@dHk]ϕ(Sk(q))LGAP(q; r) = \frac{1}{r+1} \sum_{k=0}^{r} [P@\text{d}_H \leq k] \cdot \phi(S_k(q))

where P@dHkP@\text{d}_H \leq k denotes precision within Hamming radius kk, and Sk(q)S_k(q) the retrieved set. Averaging across queries yields mLGAP. mLGAP is specifically designed to balance early retrieval of relevant items with code space effectiveness, addressing a major limitation of standard MAP@100 in hashing-based contexts (Ding et al., 2018).

7. Interpretation, Baselines, and Model Comparison

Observed MAP@100 values must be interpreted relative to random baselines. Statistical significance can be rigorously assessed via the expectation and variance of AP@100 under hypothesized null models (random permutations or independent Bernoulli draws), allowing for z-test construction and confidence interval estimation:

z=MAP@100^μ0σ2/Qz = \frac{\widehat{MAP@100} - \mu_0}{\sqrt{\sigma^2/Q}}

where μ0\mu_0 and σ2\sigma^2 are null expectation and variance, QQ is the number of queries. Confidence intervals facilitate robust comparison across models and reliable discrimination from chance performance (Manzhos et al., 4 Nov 2025).


MAP@100 remains a standard for assessing truncated ranking quality in retrieval and recommendation systems, underpinning both evaluation and learning. Its well-characterized statistical properties and differentiable generalizations support principled benchmarking, model selection, and end-to-end task-driven optimization across diverse large-scale information retrieval domains (Ding et al., 2018, Manzhos et al., 4 Nov 2025, He et al., 2018, Revaud et al., 2019).

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Mean Average Precision (MAP@100).