Multiple Negatives Ranking in Dense Retrieval

Updated 29 December 2025

Multiple Negatives Ranking (MNR) is a batch-wise contrastive loss framework that leverages in-batch negatives to fine-tune deep encoders like Sentence-BERT, enhancing dense retrieval performance.
MNR employs an innovative strategy where each mini-batch's non-corresponding gold answers serve as hard negatives, eliminating the need for expensive external negative mining.
In the SPBERTQA system for Vietnamese medical QA, fine-tuning PhoBERT with MNR outperformed bag-of-words and vanilla transformer baselines, significantly improving metrics such as mAP and P@1.

Multiple Negatives Ranking (MNR) is a batch-wise contrastive loss framework for fine-tuning deep encoders, particularly Sentence-BERT (SBERT), for dense retrieval tasks such as question answering over long text passages. MNR leverages in-batch negatives, using every other positive instance in a mini-batch as “hard” negatives for each anchor, obviating the need for costly negative mining. The technique is core to SPBERTQA—a two-stage retrieval system combining statistical and neural models for medical question answering in Vietnamese, where MNR demonstrates substantial empirical benefits over both bag-of-words and vanilla transformer baselines (Nguyen et al., 2022).

1. Mathematical Definition

Given a batch of $K$ question–passage pairs $\{(x_i, y_i)\}_{i=1}^K$ where $x_i$ represents the $i$ -th question and $y_i$ the gold answer-passage, the training framework utilizes cosine similarity $S(x, y)$ between question and passage representations from a SBERT encoder. The MNR loss is defined:

$\mathcal{L}_{\rm MNR} \;=\; -\,\frac{1}{N}\,\frac{1}{K}\, \sum_{i=1}^K \Bigl[ S(x_i, y_i) \;-\; \log\!\bigl(\sum_{j=1}^K e^{S(x_i, y_j)}\bigr) \Bigr]$

Here, $N$ is the total count of training pairs (often omitted in per-batch settings), $K$ is batch size, $S(x, y)$ is the cosine similarity of mean-pooled SBERT outputs, and $y_j$ (for $j\neq i$ ) acts as a negative for anchor $x_i$ . This loss incentivizes the encoder to assign higher cosine similarity to the correct question–answer pairs relative to all other in-batch answers.

2. Anchor, Positive, and Negative Construction

For each element in a batch:

Anchor $(x_i)$ : the $i$ -th question in the current mini-batch.
Positive $(y_i)$ : the gold answer-passage corresponding to $x_i$ .
Negatives $\{y_j \mid j \neq i\}$ : the remaining $K-1$ answer-passages in the batch, each of which is the gold passage for some other batch question.

This in-batch scheme exploits natural “hard” negatives without explicit negative mining and scales linearly with batch size, thus efficiently increasing the hardness of negative samples as batch size increases.

3. Hyperparameters and Implementation Details

The SPBERTQA implementation employs the following parameters and choices:

Encoder Backbone: PhoBERT (monolingual Vietnamese RoBERTa).
Fine-tuning Configuration:
- Epochs: 15
- Batch size ( $K$ ): 32
- Learning rate: $2 \times 10^{-5}$
- Max sequence length: 256 tokens
- Regularization: AdamW with default HuggingFace settings (weight decay)
Similarity Metric: Cosine similarity of mean-pooled final layer embeddings.
Temperature Scaling: No extra scaling beyond the implicit $\tau = 1$ in the exponential terms.
Negative Mining: Fully in-batch; no external negative example mining is performed.

4. Integration into Two-Stage Retrieval

SPBERTQA structures retrieval in two distinct phases:

A. BM25-Based Filtering: - Each corpus passage is split into sentences. - BM25 indexes all sentences and, given a question, retrieves the top $K_s=5$ most relevant sentences per passage as candidate blocks, effectively narrowing candidate search space.

B. SBERT Reranking with MNR Fine-Tuning: - In training, questions and sentence blocks are encoded via SBERT, and MNR loss is optimized to maximize correct pair similarity relative to in-batch negatives. - At inference, cosine similarity from the fine-tuned SBERT encoder re-ranks the candidate sentences. - BM25 is used solely as a filter; SBERT reranking with MNR constitutes the only learned, gradient-driven step.

5. Empirical Impact of MNR Fine-Tuning

Ablation studies on the ViHealthQA test set demonstrate the impact of different retrieval models and MNR fine-tuning. Representative retrieval metric values (in %) are summarized below.

Model	mAP	P@1	P@10
BM25	56.93	44.96	70.09
PhoBERT (no fine-tune)	12.45	6.95	23.10
BM25 + XLM-R (MNR fine-tuned)	53.85	46.05	79.04
BM25 + mBERT (MNR fine-tuned)	55.52	44.91	75.71
BM25 + PhoBERT (MNR fine-tuned, SPBERTQA)	62.25	50.92	83.76

Key findings:

Fine-tuning PhoBERT with MNR raises mAP from 56.93 (BM25 alone) to 62.25, and P@1 from 44.96 to 50.92.
A pure PhoBERT embedder without MNR fine-tuning or BM25 suffers from drastically degraded retrieval (e.g., P@1 ≈ 7%).
Domain- and language-matched backbones (PhoBERT) with MNR outperform multilingual models (XLM-R, mBERT) in this applied context.

6. Practical Reproducibility in SBERT-Style Retrieval

Implementation of MNR within an SBERT-style retrieval pipeline based on the SPBERTQA design involves:

Forming mini-batches with $K$ question–answer pairs.
Treating the batch’s other gold answers as in-batch negatives for each anchor.
Fine-tuning with the MNR loss over cosine similarities between question and passage representations.
Using BM25 as a lightweight sentence-level filter for long passages before neural reranking.

Expectation is an improvement of approximately 5–6 percentage points in mean average precision over BM25-only retrieval with this method (Nguyen et al., 2022).

7. Significance in Domain-Adapted Dense Retrieval

The empirical advantage of MNR arises from its ability to supply a diverse set of semantically rich, difficult negatives at every update, accelerating the training of discriminative representations for dense retrieval. The improvement is contingent upon a strong, domain-matched encoder backbone and benefits from the efficient, mining-free composition of negatives present naturally within question-answering batch construction. The lack of improvement from multilingual encoders in comparison to PhoBERT plus MNR in the ViHealthQA setting suggests strong synergy between corpus/domain adaptation and hard-negative-driven fine-tuning (Nguyen et al., 2022).

PDF Markdown Chat (Pro)

References (1)

SPBERTQA: A Two-Stage Question Answering System Based on Sentence Transformers for Medical Texts (2022)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Multiple Negatives Ranking (MNR).

Multiple Negatives Ranking in Dense Retrieval

1. Mathematical Definition

2. Anchor, Positive, and Negative Construction

3. Hyperparameters and Implementation Details

4. Integration into Two-Stage Retrieval

5. Empirical Impact of MNR Fine-Tuning

6. Practical Reproducibility in SBERT-Style Retrieval

7. Significance in Domain-Adapted Dense Retrieval

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Multiple Negatives Ranking in Dense Retrieval

1. Mathematical Definition

2. Anchor, Positive, and Negative Construction

3. Hyperparameters and Implementation Details

4. Integration into Two-Stage Retrieval

5. Empirical Impact of MNR Fine-Tuning

6. Practical Reproducibility in SBERT-Style Retrieval

7. Significance in Domain-Adapted Dense Retrieval

Sponsor

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research