Operational Advice for Dense and Sparse Retrievers: HNSW, Flat, or Inverted Indexes? (2409.06464v1)

Published 10 Sep 2024 in cs.IR

Abstract: Practitioners working on dense retrieval today face a bewildering number of choices. Beyond selecting the embedding model, another consequential choice is the actual implementation of nearest-neighbor vector search. While best practices recommend HNSW indexes, flat vector indexes with brute-force search represent another viable option, particularly for smaller corpora and for rapid prototyping. In this paper, we provide experimental results on the BEIR dataset using the open-source Lucene search library that explicate the tradeoffs between HNSW and flat indexes (including quantized variants) from the perspectives of indexing time, query evaluation performance, and retrieval quality. With additional comparisons between dense and sparse retrievers, our results provide guidance for today's search practitioner in understanding the design space of dense and sparse retrievers. To our knowledge, we are the first to provide operational advice supported by empirical experiments in this regard.

Summary

The paper provides actionable guidance by comparing HNSW and flat indexes, revealing that HNSW can be 2-3x faster for medium corpora and over 10x for large datasets.
The study demonstrates that int8 quantization significantly accelerates query evaluation by 25-50% for flat indexes and over 100% for HNSW, with minimal quality loss.
The paper finds that dense and sparse retrieval models are comparably effective, while BM25 offers speed advantages for scenarios where rapid responses are critical.

Operational Advice for Dense and Sparse Retrievers: HNSW, Flat, or Inverted Indexes?

The paper "Operational Advice for Dense and Sparse Retrievers: HNSW, Flat, or Inverted Indexes?" by Jimmy Lin provides a detailed empirical paper and practical guidance for the choice of vector search implementations within retrieval systems. The research focuses on three key questions related to the usage of hierarchical navigable small-world network (HNSW) indexes, flat indexes, and quantization techniques, with a broader comparison between dense and sparse retrieval models.

Introduction

The quality of retrieval-augmented generation (RAG) heavily relies on the precision of the retrieval system in producing relevant documents from a corpus. Practitioners in the field must choose between dense and sparse retrieval models and then decide on the nearest-neighbor vector search implementation. The paper aims to elucidate the tradeoffs involved in these decisions by evaluating the performance of HNSW and flat indexes using the BGE dense retrieval model, SPLADE++ EnsembleDistil (ED) sparse retrieval model, and BM25 as a baseline across various dimensions including indexing time, query evaluation performance, and retrieval quality.

Primary Research Questions and Findings

When to Use HNSW vs. Flat Indexes for Dense Retrieval (RQ1)?

The paper shows that for small corpora (less than 100K documents), there are negligible differences between flat and HNSW indexes. However, for medium-sized corpora (100K to 1M documents), HNSW indexes demonstrate a 2-3x improvement in query evaluation performance over flat indexes, albeit at the cost of increased indexing time. For large corpora (over 1M documents), HNSW indexes significantly outperform flat indexes in query evaluation performance, often more than an order of magnitude faster, while incurring a substantial increase in indexing time.

Impact of Quantization on Index Performance (RQ2)

Quantization (int8) yields significant improvements in query evaluation performance for both flat and HNSW indexes without substantial losses in retrieval quality. For flat indexes, quantization generally improves performance by 25-50%, while for HNSW indexes, the improvements can be even higher, sometimes exceeding 100% for large corpora. The additional indexing time required by quantization is relatively minor compared to the benefits in operational efficiency.

Effectiveness-Efficiency Tradeoffs Between Dense and Sparse Retrieval (RQ3)

Dense retrieval (BGE) and sparse retrieval (SPLADE++ ED) are generally comparable in terms of effectiveness, with no clear dominance of one approach over the other. However, BM25, while performing poorer in retrieval quality, offers superior speeds and hence is of interest for scenarios where retrieval speed is critical. The results suggest that the choice between dense and sparse models should consider both effectiveness and efficiency depending on the application requirements.

Discussion

The findings emphasize that while HNSW indexes offer significant performance advantages for large-scale applications, the associated costs in indexing time and potential degradation in retrieval quality need to be carefully balanced. Quantization is recommended for practitioners looking to enhance query evaluation performance, provided they can accommodate the slight reductions in retrieval quality.

Practical Implications and Future Work

This research contributes practical advice based on empirical evidence, replacing vague guidance commonly found in existing literature and industry discussions. For systems dealing with large corpora, the recommendation is to use HNSW indexes despite their longer indexing times, considering the benefits gained in query performance. For smaller corpora or rapid prototyping, flat indexes may suffice.

Additionally, while this paper focuses on particular retrieval models and an implementation-specific setup with Lucene, future work can extend to include diverse models and systems to confirm the generalizability of these findings. Furthermore, issues related to reranking, prompt engineering, and dynamic document handling in full RAG systems remain as open questions for ongoing research.

Conclusion

The paper provides an actionable guide for search practitioners navigating dense and sparse retrieval landscapes, offering nuanced insights into the tradeoffs between HNSW and flat indexes. These findings are essential for making informed design choices, optimizing both the effectiveness and efficiency of retrieval systems deployed in real-world applications.

Related Papers

Tweets

https://twitter.com/lintool/status/1833856326130102353

https://twitter.com/_reachsumit/status/1833696083341701452

https://twitter.com/fly51fly/status/1835067743827296658

https://twitter.com/gm8xx8/status/1833696264224985562

https://twitter.com/RecPaperBot/status/1834909967880798464

HackerNews

Operational Advice for Dense and Sparse Retrievers (1 point, 0 comments)