Scaling Laws For Dense Retrieval (2403.18684v2)

Published 27 Mar 2024 in cs.IR and cs.CL

Abstract: Scaling up neural models has yielded significant advancements in a wide array of tasks, particularly in language generation. Previous studies have found that the performance of neural models frequently adheres to predictable scaling laws, correlated with factors such as training set size and model size. This insight is invaluable, especially as large-scale experiments grow increasingly resource-intensive. Yet, such scaling law has not been fully explored in dense retrieval due to the discrete nature of retrieval metrics and complex relationships between training data and model sizes in retrieval tasks. In this study, we investigate whether the performance of dense retrieval models follows the scaling law as other neural models. We propose to use contrastive log-likelihood as the evaluation metric and conduct extensive experiments with dense retrieval models implemented with different numbers of parameters and trained with different amounts of annotated data. Results indicate that, under our settings, the performance of dense retrieval models follows a precise power-law scaling related to the model size and the number of annotations. Additionally, we examine scaling with prevalent data augmentation methods to assess the impact of annotation quality, and apply the scaling law to find the best resource allocation strategy under a budget constraint. We believe that these insights will significantly contribute to understanding the scaling effect of dense retrieval models and offer meaningful guidance for future research endeavors.

PDF HTML Abstract

Exploring Scaling Laws for Dense Retrieval

Introduction to Scaling Laws in Dense Retrieval

Recent advancements in neural network research, particularly in the domain of NLP and information retrieval (IR), have underscored the significance of scaling up neural models. A well-acknowledged phenomenon across various tasks involving LLMs is the adherence to predictable scaling laws—relationships that predict model performance based on factors like model size and training data volume. These laws have proven invaluable, especially given the resource-intensive nature of conducting large-scale experiments. Yet, the exploration of such scaling laws has remained relatively untapped in the field of dense retrieval. This gap is noteworthy considering the pivotal role of dense retrieval models in improving semantic search capabilities over conventional retrieval methods.

The paper by Yan Fang, Jingtao Zhan, Qingyao Ai, and colleagues embarks on this exploration, investigating whether dense retrieval models' performance exhibits similar scaling behavior as observed in other neural models. They delve into the intricate balance between model size, training data volume, and the intriguing possibility of scaling laws existing in the universe of dense retrieval, despite its inherent discrete evaluation metrics and the complex interplay between model and data sizes.

Key Findings and Methodology

A significant contribution of their research is the proposition of using contrastive log-likelihood as an evaluation metric, which mirrors the continuous nature of scaling laws observed in LLMs. Their extensive experimentation across different model sizes and volumes of annotated data elucidates a precise power-law scaling in dense retrieval performance. This revelation is not only theoretically intriguing but also holds practical implications in optimizing resource allocations for training dense retrieval models.

Model and Data Size Scaling

The paper effectively disentangles the effects of model and data sizes, offering insights into their individual and combined impacts on dense retrieval:

Model Size: The research observes a clear, predictable relationship between model size and retrieval performance, quantified through contrastive perplexity. This scaling behavior underscores the potential of larger models in achieving superior retrieval capabilities, albeit with diminishing returns.
Data Size: Similar scaling laws emerge when varying the amount of annotated data, indicating that more extensive training datasets generally enhance model performance, albeit at a diminishing rate of improvement. This finding highlights the critical role of data volume in training effective dense retrieval systems.

Annotation Quality and Scaling Laws

Exploring the dimension of data annotation quality sheds light on its influence on scaling laws. The paper reveals that annotation quality, from weak to high, notably impacts the scaling effect, with higher-quality annotations yielding steeper improvements in model performance. This finding accentuates the value of high-quality annotations and the potential of leveraging advanced LLMs for generating such annotations.

Implications and Future Directions

The identification of scaling laws in dense retrieval models extends beyond academic curiosity, offering tangible benefits for guiding future research and developments in IR. It provides a framework for predicting model performance under various configurations, enabling more efficient allocation of computational resources and informed decisions on data annotation strategies.

Looking ahead, this work opens several avenues for further exploration:

Extending Model and Data Size Ranges: Investigating scaling laws over a broader spectrum of model sizes and data volumes could provide deeper insights into their limitations and applicability.
Diverse Architectures and Tasks: Exploring scaling laws across different neural architectures and retrieval tasks could uncover task-specific scaling behavior, enriching our understanding of dense retrieval systems.
Practical Applications: The paper's findings on optimal resource allocation under budget constraints have immediate practical applications in designing efficient and scalable dense retrieval systems, paving the way for more cost-effective implementations in commercial search engines.

In conclusion, this pioneering exploration of scaling laws in dense retrieval marks a significant step forward in our understanding of neural information retrieval systems. It lays the groundwork for future research aimed at optimizing the design and training of dense retrieval models, ultimately advancing the state-of-the-art in semantic search technologies.