Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval (2007.00808v2)

Published 1 Jul 2020 in cs.IR, cs.CL, and cs.LG

Abstract: Conducting text retrieval in a dense learned representation space has many intriguing advantages over sparse retrieval. Yet the effectiveness of dense retrieval (DR) often requires combination with sparse retrieval. In this paper, we identify that the main bottleneck is in the training mechanisms, where the negative instances used in training are not representative of the irrelevant documents in testing. This paper presents Approximate nearest neighbor Negative Contrastive Estimation (ANCE), a training mechanism that constructs negatives from an Approximate Nearest Neighbor (ANN) index of the corpus, which is parallelly updated with the learning process to select more realistic negative training instances. This fundamentally resolves the discrepancy between the data distribution used in the training and testing of DR. In our experiments, ANCE boosts the BERT-Siamese DR model to outperform all competitive dense and sparse retrieval baselines. It nearly matches the accuracy of sparse-retrieval-and-BERT-reranking using dot-product in the ANCE-learned representation space and provides almost 100x speed-up.

Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval

Introduction

In the field of text retrieval, the transition from sparse to dense retrieval methods has been an area of substantial interest and innovation. Sparse retrieval methods such as BM25 rely heavily on discrete term matches, which inherently limits their ability to fully leverage distributed neural representations. Dense Retrieval (DR) addresses this limitation by employing continuous representation spaces refined using deep learning techniques. Despite its advantages, DR often lags behind the performance of sparse retrieval methods in practical applications.

Core Concept

The primary bottleneck, as identified in this paper, lies in the construction of effective negative samples during training. In traditional DR, local negatives sampled from within the mini-batch often prove to be uninformative, leading to lower gradient norms and high stochastic gradient variances during learning. This results in suboptimal convergence rates. In contrast, the paper proposes a novel method called Approximate Nearest Neighbor Negative Contrastive Learning (ANCE), which selects adversarial negatives from the entire corpus using an asynchronous ANN index, thereby addressing the limitations posed by local negatives.

Main Contributions

The paper presents several notable contributions:

  1. Theoretical Analysis: The authors provide a comprehensive theoretical framework based on variance reduction to explain why local negatives hinder the convergence of DR training.
  2. ANCE Mechanism: They introduce ANCE, which constructs global negatives using an incrementally updated ANN index. This mechanism ensures that the negatives used are the hardest possible based on the current model's state.
  3. Empirical Validation: Through rigorous experimentation on web search, open-domain question answering (OpenQA), and commercial search environments, the paper demonstrates the effectiveness of ANCE, showing near-parity with BERT-based IR models while being significantly more efficient (over 100x faster).

Experimental Results

The experimental validation covered three distinct scenarios: Web Search using TREC DL benchmarks, OpenQA with datasets such as Natural Questions (NQ) and TriviaQA (TQA), and an industrial-scale application in a commercial search engine. The key findings are:

  • Performance Metrics: ANCE (FirstP) achieved an MRR@10 of 0.33 in the MARCO Dev Passage and an NDCG@10 of 0.67 in TREC DL Passage Retrieval, outperforming existing dense retrieval baselines and closely matching the performance of interaction-based BERT rerankers.
  • Efficiency Metrics: ANCE demonstrated a training-to-inference efficiency gain of approximately 100x compared to traditional BERT reranking, thus showing potential for deployment in time-sensitive applications.

Implications and Future Directions

The implications of this research are multifaceted:

  1. Practical Deployment: The substantial efficiency gains without significant loss in accuracy make ANCE a viable candidate for large-scale deployment in real-world systems where computational resources are a critical constraint.
  2. Theoretical Insights: The theoretical insights on the convergence properties of DR training with ANCE could guide future optimization strategies in neural retrieval models. The demonstration that global negatives are more beneficial than local negatives challenges existing paradigms and opens new avenues for improving training efficacy.

Additionally, future research could explore combining ANCE with other pretraining paradigms or expanding its applicability to other domains within information retrieval and NLP. Further exploration into hybrid models that leverage the strengths of both dense and sparse retrieval methods could also yield productive results.

Conclusion

The Approximate Nearest Neighbor Negative Contrastive Learning (ANCE) represents a significant step forward in addressing the inherent limitations of dense text retrieval. By synthesizing theoretical insights with empirical validation, the paper not only advances our understanding of DR training dynamics but also paves the way for more efficient and effective retrieval systems.

The holistic approach taken—assessing both theoretical underpinnings and practical implementations—sets a benchmark for future research in the field, emphasizing the importance of tackling theoretical challenges with innovative, application-ready solutions.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Lee Xiong (3 papers)
  2. Chenyan Xiong (95 papers)
  3. Ye Li (155 papers)
  4. Kwok-Fung Tang (1 paper)
  5. Jialin Liu (97 papers)
  6. Paul Bennett (17 papers)
  7. Junaid Ahmed (5 papers)
  8. Arnold Overwijk (9 papers)
Citations (1,066)
X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com