Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

125 tokens/sec

GPT-4o

47 tokens/sec

Gemini 2.5 Pro Pro

43 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

47 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

13 3

Retrieval with Learned Similarities (2407.15462v3)

Published 22 Jul 2024 in cs.IR, cs.DB, cs.DS, and cs.LG

Abstract: Retrieval plays a fundamental role in recommendation systems, search, and NLP by efficiently finding relevant items from a large corpus given a query. Dot products have been widely used as the similarity function in such tasks, enabled by Maximum Inner Product Search (MIPS) algorithms for efficient retrieval. However, state-of-the-art retrieval algorithms have migrated to learned similarities. These advanced approaches encompass multiple query embeddings, complex neural networks, direct item ID decoding via beam search, and hybrid solutions. Unfortunately, we lack efficient solutions for retrieval in these state-of-the-art setups. Our work addresses this gap by investigating efficient retrieval techniques with expressive learned similarity functions. We establish Mixture-of-Logits (MoL) as a universal approximator of similarity functions, demonstrate that MoL's expressiveness can be realized empirically to achieve superior performance on diverse retrieval scenarios, and propose techniques to retrieve the approximate top-k results using MoL with tight error bounds. Through extensive experimentation, we show that MoL, enhanced by our proposed mutual information-based load balancing loss, sets new state-of-the-art results across heterogeneous scenarios, including sequential retrieval models in recommendation systems and finetuning LLMs for question answering; and our approximate top-$k$ algorithms outperform baselines by up to 66x in latency while achieving >.99 recall rate compared to exact algorithms.

References (46)

Summary

The paper demonstrates the universal approximation capacity of Mixture-of-Logits for learned similarity functions, advancing retrieval theory.
It proposes exact and approximate retrieval algorithms that balance speed and accuracy by narrowing search spaces and leveraging heuristics.
Empirical results reveal significant improvements in hit rates and latency on large datasets like MovieLens and Amazon Books.

Efficient Retrieval with Learned Similarities

The paper "Efficient Retrieval with Learned Similarities" authored by Bailu Ding and Jiaqi Zhai, addresses a fundamental challenge in the domain of recommendation systems, search, and natural language processing: efficient retrieval of relevant items from vast datasets. The research pivots on the classical problem of Maximum Inner Product Search (MIPS) and extends towards more advanced learned similarity functions, which have seen increasing adoption in state-of-the-art retrieval algorithms.

Key Insights and Contributions

Learned Similarities and Expressiveness: The paper makes a significant theoretical contribution by demonstrating that the Mixture-of-Logits (MoL) is a universal approximator for learned similarity functions. This theoretically grounds the use of MoL in scenarios involving complex similarity functions that are not easily handled by traditional dot-product-based MIPS.
Retrieval Efficiency: A core aspect of the paper is the development of both exact and approximate retrieval algorithms using MoL. The exact algorithm employs a two-pass retrieval method that narrows down potential candidates before refining the search space. Approximate algorithms leverage heuristics like top-K per embedding and average top-K, optimizing the retrieval process to strike a balance between speed and accuracy.
Empirical Validation: Through rigorous empirical evaluations on three prominent recommendation datasets (MovieLens 1M, MovieLens 20M, and Amazon Books), the paper shows that MoL-based methods significantly outperform traditional dot-product-based retrievals in terms of hit rate and mean reciprocal rank (MRR). Specifically, improvements in HR@1 by 21.4% on average and HR@10 by 13.7% across six different settings highlight the efficacy of MoL.

Empirical Evaluation

Empirical evaluations focus on top-K retrieval performance. Key results show that approximate retrieval methods achieve a hit rate of >99% relative to ground truth while providing substantial latency reductions:

MovieLens 20M: TopKAvg achieves >99% relative hit rate at HR@100, with a fourfold speedup in latency.
Amazon Books: TopKAvg demonstrates a similar performance, achieving >99% hit rate with a 91× reduction in latency compared to brute-force methods.

Implications

Practical Implications:

The proposed retrieval algorithms optimize for large-scale recommendation systems, reducing computational overhead while maintaining high retrieval quality.
Adoption in real-time systems, like those deployed in industry, can benefit from the latency improvements, allowing for faster and more efficient user recommendations.

Theoretical Implications:

The universal approximator property of MoL suggests that it can be applied across a wide range of retrieval tasks beyond the datasets tested, facilitating further research into diverse applications.
Provides a framework for extending retrieval mechanisms in natural language processing and search systems.

Future Developments

The paper leaves room for future work, primarily in the areas of handling even larger datasets and optimizing low-level GPU kernels. More efficient implementations of the two-pass exact retrieval algorithm and exploration of additional optimization techniques tailored to specific hardware accelerators are potential avenues for further enhancing performance.

Conclusion

This paper makes a notable stride in advancing the efficiency of retrieval algorithms using learned similarities. By leveraging MoL as a universal approximator and developing corresponding retrieval algorithms, it sets a new benchmark in the field, particularly in recommendation systems. The demonstrated efficiency and accuracy improvements underline the practical and theoretical potential of the proposed methods, paving the way for further innovations in efficient large-scale retrieval.

PDF Markdown

Tweets

https://twitter.com/fly51fly/status/1815875456589316124

https://twitter.com/UFCS/status/1884284793321300391

HackerNews

Retrieval with Learned Similarities (3 points, 0 comments)