NV-Retriever: Improving text embedding models with effective hard-negative mining (2407.15831v1)

Published 22 Jul 2024 in cs.IR and cs.AI

Abstract: Text embedding models have been popular for information retrieval applications such as semantic search and Question-Answering systems based on Retrieval-Augmented Generation (RAG). Those models are typically Transformer models that are fine-tuned with contrastive learning objectives. Many papers introduced new embedding model architectures and training approaches, however, one of the key ingredients, the process of mining negative passages, remains poorly explored or described. One of the challenging aspects of fine-tuning embedding models is the selection of high quality hard-negative passages for contrastive learning. In this paper we propose a family of positive-aware mining methods that leverage the positive relevance score for more effective false negatives removal. We also provide a comprehensive ablation study on hard-negative mining methods over their configurations, exploring different teacher and base models. We demonstrate the efficacy of our proposed methods by introducing the NV-Retriever-v1 model, which scores 60.9 on MTEB Retrieval (BEIR) benchmark and 0.65 points higher than previous methods. The model placed 1st when it was published to MTEB Retrieval on July 07, 2024.

PDF Abstract

NV-Retriever: Improving Text Embedding Models with Effective Hard-Negative Mining

The paper NV-Retriever: Improving Text Embedding Models with Effective Hard-Negative Mining by Moreira et al. addresses a critical aspect of text retrieval models, touching upon the often-overlooked, yet fundamental, facet of hard-negative mining. This work introduces novel positive-aware mining methods to enhance text embeddings, ensuring more effective and efficient contrastive learning.

Core Contributions and Methodologies

The primary contributions of the paper are outlined as follows:

Positive-Aware Hard-Negative Mining Methods:
- Introduction of a new class of mining techniques that leverage positive relevance scores to better eliminate false negatives.
- Two specific methods, TopK-MarginPos and TopK-PercPos, showed superior performance by setting thresholds relative to the positive passage scores.
Comprehensive Ablation Study:
- Detailed experimentation comparing several hard-negative mining strategies and configurations across different teacher models.
- Exploration of the effect of ensembling hard-negatives from different models on the fine-tuned models' accuracy.
Release of State-of-the-Art Model: NV-Retriever-v1:
- Presentation of a new model that achieved top rankings on MTEB Retrieval benchmarks by applying the proposed hard-negative mining methodologies.

Key Findings and Numerical Results

The detailed experiments exhibited clear numerical advantages of the proposed mining methods:

Models trained with positive-aware mining techniques, specifically TopK-PercPos, significantly outperformed traditional methods in terms of NDCG@10 scores. The TopK-PercPos method, leveraging a threshold set to 95% of the positive relevance score, achieved an average NDCG@10 score of 0.5856.
When employing different teacher models, significant variations in downstream performance were observed. For instance, hard-negative mining using the e5-mistral-7b-instruct model yielded a superior NDCG@10 of 0.5810 compared to other base models.
The paper also unveiled that ensembling hard-negatives from various models could marginally improve results, with intra-sample ensembling delivering an NDCG@10 of 0.5825.

Implications and Future Directions

The implications of these findings are both practical and theoretical:

Practical Implications:
- Improved retrieval accuracy directly translates to more effective semantic search and QA systems. By integrating positive-aware hard-negative mining methods, existing retrieval systems can gain substantial performance improvements.
- Adoption of these methods in industry could enhance the quality of retrieval-augmented generation (RAG) tasks, such as in enterprise search or customer support applications.
Theoretical Implications:
- This research advances the understanding of hard-negative mining's role in contrastive learning, suggesting that relative relevance scores are a critical factor.
- The comprehensive ablation provides a robust framework for future research, promoting deeper exploration into more refined and sophisticated negative mining strategies.

Speculations on Future Developments

Looking forward, the research suggests several promising directions:

Advanced Ensembling Techniques: Future work could further refine ensembling strategies, possibly integrating more complex methods from ensemble learning to optimize hard-negative selections.
Dynamic Mining: Implementing on-the-fly or semi-dynamic mining strategies during training could dynamically adapt to the evolving nature of the model's understanding, potentially avoiding overfitting to static hard-negatives.
Cross-Task Transferability: Exploring the transferability of these methods across different tasks (e.g., from retrieval to ranking or classification) might provide insights into the generalizability of positive-aware mining techniques.

Conclusion

Moreira et al.'s paper offers substantial advancements in the field of text embedding models by focusing on the under-investigated area of hard-negative mining. The NV-Retriever-v1 model, designed with innovative positive-aware mining methods, sets a new benchmark in retrieval tasks, demonstrating the significant potential of these approaches. This work not only contributes immediate practical improvements but also paves the way for future exploration and innovation in text retrieval technologies.