Mean Average Precision (MAP@100)

Updated 17 February 2026

MAP@100 is a metric that evaluates ranked retrieval quality by averaging precision scores for the top 100 results, considering both relevance and item position.
It involves ranking items by similarity, computing precision at each rank, and normalizing by the smaller of the number of relevant items or 100.
MAP@100 is used as both an evaluation benchmark and a learning objective in deep metric and hashing systems, improving performance in large-scale applications.

Mean Average Precision at 100 (MAP@100) is a statistical evaluation metric widely adopted in information retrieval, recommender systems, image retrieval, and descriptor learning to assess the quality of ranked retrieval for a set of queries when only the top 100 results are of primary interest. By accounting for both the relevance of items and their positions within the truncated ranking, MAP@100 provides a nuanced measure of retrieval effectiveness, especially for applications—such as visual search or local descriptor matching—where operational cutoffs are imposed. MAP@100 is ubiquitous as both a benchmarking tool and a learning objective in large-scale retrieval, hashing, and end-to-end deep metric learning frameworks (Ding et al., 2018, Manzhos et al., 4 Nov 2025, He et al., 2018, Revaud et al., 2019).

1. Formal Definition and Mathematical Formulation

For a query $q$ , a retrieval system produces an ordered list of items, among which $N_{q}$ are defined as relevant (ground truth). Let $K$ denote the cutoff rank: in MAP@100, $K=100$ . Let $\operatorname{rel}_q(k)$ be an indicator function ($1$ if the $k$ -th ranked item is relevant, $0$ otherwise), and $P_q(k)=\frac{1}{k} \sum_{i=1}^k \operatorname{rel}_q(i)$ denote the precision at cutoff $k$ . The Average Precision at $K$ for a query $q$ is

$AP(q, K) = \frac{1}{\min(N_q, K)} \sum_{k=1}^{K} P_q(k) \cdot \operatorname{rel}_q(k)$

The Mean Average Precision at $K$ (MAP@K), and specifically MAP@100, is then given by averaging $AP(q, 100)$ over a set of queries $Q$ :

$MAP@100 = \frac{1}{|Q|} \sum_{q \in Q} AP(q, 100)$

This metric ensures normalization with respect to the smaller of the number of relevant items or the cutoff, yielding a value in the interval $[0,1]$ for every query (Ding et al., 2018, Manzhos et al., 4 Nov 2025, He et al., 2018, Revaud et al., 2019).

2. Calculation Procedure and Stepwise Example

Evaluating MAP@100 involves, for each query:

Ranking all database items by similarity to the query and selecting the top 100.
For each position $k \in [1,100]$ $k \in [1, 100]$ :
- Compute precision at $k$ .
- If the item is relevant ( $\operatorname{rel}_q(k)=1$ ), accumulate $P_q(k)$ .
Normalize the accumulated sum by $\min(N_q,100)$ .
Average across all queries.

For instance, if a query $q$ has 3 relevant items in the database, and the top 5 retrieved have relevance pattern [1, 0, 1, 0, 1]:

$k$	rel_q(k)	P_q(k)	Contribution
1	1	1/1 = 1.0	1.0
2	0	1/2 = 0.5	0
3	1	2/3 ≈ 0.667	0.667
4	0	2/4 = 0.5	0
5	1	3/5 = 0.6	0.6

Sum: $1.0 + 0.667 + 0.6 = 2.267$, yielding $AP(q,5) = 2.267/3 ≈ 0.756$ . Averaging such APs across queries yields MAP@K (Ding et al., 2018).

3. Statistical Properties and Random Ranking Baselines

Under random ranking, the expected value and variance of $AP@100$ serve as essential performance baselines. Two principal evaluation models are established:

Offline (without replacement): Given a corpus size $N$ , with $m$ relevant items, the expectation and variance are given by

$\mathbb{E}_{WOR}[AP@100] = \frac{m}{N M}\left( \frac{m-1}{N-1} \cdot 100 + \frac{N-m}{N-1} H_{100} \right)$

where $M = \min(m,100)$ and $H_{100} = \sum_{i=1}^{100} \frac{1}{i} \approx 5.1874$ .
Online (with replacement/Bernoulli model): For label probability $p$ ,

$\mathbb{E}_{WR}[AP@100] = p \left( p + (1-p) \frac{H_{100}}{100} \right)$

Variances are also derived and enable construction of confidence intervals on observed MAP@100 for statistical significance evaluation.

For example: In a 1000-item dataset, $m=50$ relevant, offline, $E_{WOR}[AP@100] \approx 0.00984$ ; in the online model, $p=0.05$ , $E_{WR}[AP@100] \approx 0.00497$ (Manzhos et al., 4 Nov 2025).

4. Role as Learning Objective and Differentiable Approximations

MAP@100 serves not only as an evaluation metric but also as a direct optimization objective in neural descriptor learning, metric learning, and hashing. The key challenge is the non-differentiability of AP due to ranking operations. Recent approaches implement differentiable approximations to AP@K using histogram binning and smoothing kernels.

Letting $S^q$ be the similarities between query $q$ and database items and $Y^q$ the relevance vector, AP@K can be approximated via "soft" bin counts over $M$ bins (triangular kernels), yielding a differentiable $\widehat{AP}$ :

$\widehat{AP}(S^q,Y^q) = \sum_{m=1}^{M} \frac{H^+(m)}{H(m)} \cdot \frac{h^+(m)}{N^q}$

Gradients with respect to the scores can be computed by backpropagating through these aggregates, enabling end-to-end neural network training directly for (approximated) MAP@100 (He et al., 2018, Revaud et al., 2019). This approach eliminates the need for auxiliary ranking surrogates like triplet or pairwise losses and improves retrieval performance especially at operationally relevant cutoffs.

5. Practical Considerations and Limitations

MAP@100 is favored in large-scale retrieval—such as visual search or recommender settings—where only a finite prefix of the ranking is actionable. Several practical issues arise:

Efficient Top-K Retrieval: In massive databases, exhaustive similarity computation is infeasible. Sublinear retrieval approaches (e.g., local-sensitive hashing or Hamming radius expansion) are cooptimized with MAP@100 evaluation, but may introduce tie-breaking ambiguities, especially in hashing-based systems (Ding et al., 2018).
Tie-breaking and Hash Collisions: In regimes with numerous hash collisions (e.g., identical Hamming codes shared across many items), AP@K and hence MAP@100 can take non-unique values depending on how ties are resolved. Pushing for perfect mAP in such contexts can incentivize degenerate codebook utilization, where all relevant items are mapped to a single code, reducing retrieval diversity (Ding et al., 2018).
Cutoff Effects: The normalization of AP by $\min(N_q,K)$ makes scores comparable across queries, but can underrepresent the influence of queries with many more than 100 relevant items, and the AP may become insensitive to quality beyond the cutoff.

6. Extensions and Alternative Metrics

MAP@100, while effective for positionally sensitive ranking assessment, does not penalize codebook collapse or quantify code utilization in hashing. The Mean Local Group Average Precision (mLGAP) augments AP with a dispersion penalty $\phi(S)$ penalizing collisions and rewarding uniform code use:

$LGAP(q; r) = \frac{1}{r+1} \sum_{k=0}^{r} [P@\text{d}_H \leq k] \cdot \phi(S_k(q))$

where $P@\text{d}_H \leq k$ denotes precision within Hamming radius $k$ , and $S_k(q)$ the retrieved set. Averaging across queries yields mLGAP. mLGAP is specifically designed to balance early retrieval of relevant items with code space effectiveness, addressing a major limitation of standard MAP@100 in hashing-based contexts (Ding et al., 2018).

7. Interpretation, Baselines, and Model Comparison

Observed MAP@100 values must be interpreted relative to random baselines. Statistical significance can be rigorously assessed via the expectation and variance of AP@100 under hypothesized null models (random permutations or independent Bernoulli draws), allowing for z-test construction and confidence interval estimation:

$z = \frac{\widehat{MAP@100} - \mu_0}{\sqrt{\sigma^2/Q}}$

where $\mu_0$ and $\sigma^2$ are null expectation and variance, $Q$ is the number of queries. Confidence intervals facilitate robust comparison across models and reliable discrimination from chance performance (Manzhos et al., 4 Nov 2025).

MAP@100 remains a standard for assessing truncated ranking quality in retrieval and recommendation systems, underpinning both evaluation and learning. Its well-characterized statistical properties and differentiable generalizations support principled benchmarking, model selection, and end-to-end task-driven optimization across diverse large-scale information retrieval domains (Ding et al., 2018, Manzhos et al., 4 Nov 2025, He et al., 2018, Revaud et al., 2019).

Markdown Upgrade to Chat

References (4)

Mean Local Group Average Precision (mLGAP): A New Performance Metric for Hashing-based Retrieval (2018)

Average Precision at Cutoff k under Random Rankings: Expectation and Variance (2025)

Local Descriptors Optimized for Average Precision (2018)

Learning with Average Precision: Training Image Retrieval with a Listwise Loss (2019)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Mean Average Precision (MAP@100).

Mean Average Precision (MAP@100)

1. Formal Definition and Mathematical Formulation

2. Calculation Procedure and Stepwise Example

3. Statistical Properties and Random Ranking Baselines

4. Role as Learning Objective and Differentiable Approximations

5. Practical Considerations and Limitations

6. Extensions and Alternative Metrics

7. Interpretation, Baselines, and Model Comparison

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Mean Average Precision (MAP@100)

1. Formal Definition and Mathematical Formulation

2. Calculation Procedure and Stepwise Example

3. Statistical Properties and Random Ranking Baselines

4. Role as Learning Objective and Differentiable Approximations

5. Practical Considerations and Limitations

6. Extensions and Alternative Metrics

7. Interpretation, Baselines, and Model Comparison

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research