Multiple Negatives Ranking in Dense Retrieval
- Multiple Negatives Ranking (MNR) is a batch-wise contrastive loss framework that leverages in-batch negatives to fine-tune deep encoders like Sentence-BERT, enhancing dense retrieval performance.
- MNR employs an innovative strategy where each mini-batch's non-corresponding gold answers serve as hard negatives, eliminating the need for expensive external negative mining.
- In the SPBERTQA system for Vietnamese medical QA, fine-tuning PhoBERT with MNR outperformed bag-of-words and vanilla transformer baselines, significantly improving metrics such as mAP and P@1.
Multiple Negatives Ranking (MNR) is a batch-wise contrastive loss framework for fine-tuning deep encoders, particularly Sentence-BERT (SBERT), for dense retrieval tasks such as question answering over long text passages. MNR leverages in-batch negatives, using every other positive instance in a mini-batch as “hard” negatives for each anchor, obviating the need for costly negative mining. The technique is core to SPBERTQA—a two-stage retrieval system combining statistical and neural models for medical question answering in Vietnamese, where MNR demonstrates substantial empirical benefits over both bag-of-words and vanilla transformer baselines (Nguyen et al., 2022).
1. Mathematical Definition
Given a batch of question–passage pairs where represents the -th question and the gold answer-passage, the training framework utilizes cosine similarity between question and passage representations from a SBERT encoder. The MNR loss is defined:
Here, is the total count of training pairs (often omitted in per-batch settings), is batch size, is the cosine similarity of mean-pooled SBERT outputs, and (for ) acts as a negative for anchor . This loss incentivizes the encoder to assign higher cosine similarity to the correct question–answer pairs relative to all other in-batch answers.
2. Anchor, Positive, and Negative Construction
For each element in a batch:
- Anchor : the -th question in the current mini-batch.
- Positive : the gold answer-passage corresponding to .
- Negatives : the remaining answer-passages in the batch, each of which is the gold passage for some other batch question.
This in-batch scheme exploits natural “hard” negatives without explicit negative mining and scales linearly with batch size, thus efficiently increasing the hardness of negative samples as batch size increases.
3. Hyperparameters and Implementation Details
The SPBERTQA implementation employs the following parameters and choices:
- Encoder Backbone: PhoBERT (monolingual Vietnamese RoBERTa).
- Fine-tuning Configuration:
- Epochs: 15
- Batch size (): 32
- Learning rate:
- Max sequence length: 256 tokens
- Regularization: AdamW with default HuggingFace settings (weight decay)
- Similarity Metric: Cosine similarity of mean-pooled final layer embeddings.
- Temperature Scaling: No extra scaling beyond the implicit in the exponential terms.
- Negative Mining: Fully in-batch; no external negative example mining is performed.
4. Integration into Two-Stage Retrieval
SPBERTQA structures retrieval in two distinct phases:
A. BM25-Based Filtering: - Each corpus passage is split into sentences. - BM25 indexes all sentences and, given a question, retrieves the top most relevant sentences per passage as candidate blocks, effectively narrowing candidate search space.
B. SBERT Reranking with MNR Fine-Tuning: - In training, questions and sentence blocks are encoded via SBERT, and MNR loss is optimized to maximize correct pair similarity relative to in-batch negatives. - At inference, cosine similarity from the fine-tuned SBERT encoder re-ranks the candidate sentences. - BM25 is used solely as a filter; SBERT reranking with MNR constitutes the only learned, gradient-driven step.
5. Empirical Impact of MNR Fine-Tuning
Ablation studies on the ViHealthQA test set demonstrate the impact of different retrieval models and MNR fine-tuning. Representative retrieval metric values (in %) are summarized below.
| Model | mAP | P@1 | P@10 |
|---|---|---|---|
| BM25 | 56.93 | 44.96 | 70.09 |
| PhoBERT (no fine-tune) | 12.45 | 6.95 | 23.10 |
| BM25 + XLM-R (MNR fine-tuned) | 53.85 | 46.05 | 79.04 |
| BM25 + mBERT (MNR fine-tuned) | 55.52 | 44.91 | 75.71 |
| BM25 + PhoBERT (MNR fine-tuned, SPBERTQA) | 62.25 | 50.92 | 83.76 |
Key findings:
- Fine-tuning PhoBERT with MNR raises mAP from 56.93 (BM25 alone) to 62.25, and P@1 from 44.96 to 50.92.
- A pure PhoBERT embedder without MNR fine-tuning or BM25 suffers from drastically degraded retrieval (e.g., P@1 ≈ 7%).
- Domain- and language-matched backbones (PhoBERT) with MNR outperform multilingual models (XLM-R, mBERT) in this applied context.
6. Practical Reproducibility in SBERT-Style Retrieval
Implementation of MNR within an SBERT-style retrieval pipeline based on the SPBERTQA design involves:
- Forming mini-batches with question–answer pairs.
- Treating the batch’s other gold answers as in-batch negatives for each anchor.
- Fine-tuning with the MNR loss over cosine similarities between question and passage representations.
- Using BM25 as a lightweight sentence-level filter for long passages before neural reranking.
Expectation is an improvement of approximately 5–6 percentage points in mean average precision over BM25-only retrieval with this method (Nguyen et al., 2022).
7. Significance in Domain-Adapted Dense Retrieval
The empirical advantage of MNR arises from its ability to supply a diverse set of semantically rich, difficult negatives at every update, accelerating the training of discriminative representations for dense retrieval. The improvement is contingent upon a strong, domain-matched encoder backbone and benefits from the efficient, mining-free composition of negatives present naturally within question-answering batch construction. The lack of improvement from multilingual encoders in comparison to PhoBERT plus MNR in the ViHealthQA setting suggests strong synergy between corpus/domain adaptation and hard-negative-driven fine-tuning (Nguyen et al., 2022).