Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Retrieval with Learned Similarities (2407.15462v3)

Published 22 Jul 2024 in cs.IR, cs.DB, cs.DS, and cs.LG

Abstract: Retrieval plays a fundamental role in recommendation systems, search, and NLP by efficiently finding relevant items from a large corpus given a query. Dot products have been widely used as the similarity function in such tasks, enabled by Maximum Inner Product Search (MIPS) algorithms for efficient retrieval. However, state-of-the-art retrieval algorithms have migrated to learned similarities. These advanced approaches encompass multiple query embeddings, complex neural networks, direct item ID decoding via beam search, and hybrid solutions. Unfortunately, we lack efficient solutions for retrieval in these state-of-the-art setups. Our work addresses this gap by investigating efficient retrieval techniques with expressive learned similarity functions. We establish Mixture-of-Logits (MoL) as a universal approximator of similarity functions, demonstrate that MoL's expressiveness can be realized empirically to achieve superior performance on diverse retrieval scenarios, and propose techniques to retrieve the approximate top-k results using MoL with tight error bounds. Through extensive experimentation, we show that MoL, enhanced by our proposed mutual information-based load balancing loss, sets new state-of-the-art results across heterogeneous scenarios, including sequential retrieval models in recommendation systems and finetuning LLMs for question answering; and our approximate top-$k$ algorithms outperform baselines by up to 66x in latency while achieving >.99 recall rate compared to exact algorithms.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Jon Louis Bentley. Multidimensional binary search trees used for associative searching. Commun. ACM, 18(9):509–517, sep 1975.
  2. Improving language models by retrieving from trillions of tokens. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu, and Sivan Sabato, editors, International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pages 2206–2240. PMLR, 2022.
  3. Linr: Model based neural retrieval on gpus at linkedin, 2024.
  4. TPU-KNN: K nearest neighbor search at peak FLOP/s. In Advances in Neural Information Processing Systems, 2022.
  5. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems, RecSys ’16, page 191–198, 2016.
  6. Pixie: A system for recommending 3+ billion items to 200+ million users in real-time. In Proceedings of the 2018 World Wide Web Conference, WWW ’18, page 1775–1784, 2018.
  7. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. CoRR, abs/1702.03118, 2017.
  8. Learning an end-to-end structure for retrieval in large-scale recommendations. In Proceedings of the 30th ACM International Conference on Information and Knowledge Management, CIKM ’21, page 524–533, 2021.
  9. End-to-end retrieval in continuous space, 2018.
  10. Similarity search in high dimensions via hashing. In Proceedings of the 25th International Conference on Very Large Data Bases, VLDB ’99, page 518–529, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers Inc.
  11. Quantization based fast inner product search. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, AISTATS 2016, volume 51, pages 482–490, 2016.
  12. Accelerating large-scale inference with anisotropic vector quantization. In Proceedings of the 37th International Conference on Machine Learning, ICML’20. JMLR.org, 2020.
  13. The movielens datasets: History and context. ACM Trans. Interact. Intell. Syst., 5(4), dec 2015.
  14. Neural collaborative filtering. In Proceedings of the 26th International Conference on World Wide Web, WWW ’17, page 173–182, 2017.
  15. Session-based recommendations with recurrent neural networks. In Yoshua Bengio and Yann LeCun, editors, 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, 2016.
  16. Extreme f-measure maximization using sparse probability estimates. In Maria Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 1435–1444, New York, New York, USA, 20–22 Jun 2016. PMLR.
  17. Diskann: Fast accurate billion-point nearest neighbor search on a single node. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  18. Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell., 33(1):117–128, jan 2011.
  19. Billion-scale similarity search with gpus. IEEE Transactions on Big Data, 7(03):535–547, Jul 2021.
  20. Self-attentive sequential recommendation. In 2018 International Conference on Data Mining (ICDM), pages 197–206, 2018.
  21. Dense passage retrieval for open-domain question answering. In Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu, editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781, Online, November 2020. Association for Computational Linguistics.
  22. Colbert: Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’20, page 39–48, New York, NY, USA, 2020. Association for Computing Machinery.
  23. Turning dross into gold loss: is bert4rec really better than sasrec? In Proceedings of the 17th ACM Conference on Recommender Systems, RecSys ’23, page 1120–1125, New York, NY, USA, 2023. Association for Computing Machinery.
  24. Matrix factorization techniques for recommender systems. Computer, 42(8):30–37, 2009.
  25. Retrieval-augmented generation for knowledge-intensive nlp tasks. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, Red Hook, NY, USA, 2020. Curran Associates Inc.
  26. Multi-interest network with dynamic routing for recommendation at tmall. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM ’19, page 2615–2623, 2019.
  27. Clustering for approximate similarity search in high-dimensional spaces. IEEE Transactions on Knowledge and Data Engineering, 14(4):792–808, 2002.
  28. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Trans. Pattern Anal. Mach. Intell., 42(4):824–836, apr 2020.
  29. Image-based recommendations on styles and substitutes. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’15, page 43–52, New York, NY, USA, 2015. Association for Computing Machinery.
  30. Non-metric similarity graphs for maximum inner product search. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
  31. Cagra: Highly parallel graph construction and approximate nearest neighbor search for gpus, 2024.
  32. Maximum inner-product search using cone trees. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’12, page 931–939, 2012.
  33. Neural collaborative filtering vs. matrix factorization revisited. In Fourteenth ACM Conference on Recommender Systems (RecSys’20), page 240–248, 2020.
  34. ColBERTv2: Effective and efficient retrieval via lightweight late interaction. In Marine Carpuat, Marie-Catherine de Marneffe, and Ivan Vladimir Meza Ruiz, editors, Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3715–3734, Seattle, United States, July 2022. Association for Computational Linguistics.
  35. Asymmetric lsh (alsh) for sublinear time maximum inner product search (mips). In Advances in Neural Information Processing Systems, volume 27, 2014.
  36. Learning to tokenize for generative retrieval. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 46345–46361. Curran Associates, Inc., 2023.
  37. Fast item ranking under neural network based measures. In Proceedings of the 13th International Conference on Web Search and Data Mining, WSDM ’20, page 591–599, New York, NY, USA, 2020. Association for Computing Machinery.
  38. Transformer memory as a differentiable search index. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.
  39. Flashlight: Scalable link prediction with effective decoders. In Bastian Rieck and Razvan Pascanu, editors, Proceedings of the First Learning on Graphs Conference, volume 198 of Proceedings of Machine Learning Research, pages 14:1–14:17. PMLR, 09–12 Dec 2022.
  40. A neural corpus indexer for document retrieval. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 25600–25614. Curran Associates, Inc., 2022.
  41. Breaking the softmax bottleneck: A high-rank RNN language model. In International Conference on Learning Representations (ICLR’18), 2018.
  42. Revisiting neural retrieval on accelerators. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’23, page 5520–5531, New York, NY, USA, 2023. Association for Computing Machinery.
  43. Actions speak louder than words: Trillion-parameter sequential transducers for generative recommendations. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors, Proceedings of the 41st International Conference on Machine Learning, volume 235 of Proceedings of Machine Learning Research, pages 58484–58509. PMLR, 21–27 Jul 2024.
  44. Atlas: A probabilistic algorithm for high dimensional similarity search. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD ’11, page 997–1008, 2011.
  45. Learning tree-based deep model for recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’18, page 1079–1088, 2018.
  46. Learning optimal tree models under beam search. In Proceedings of the 37th International Conference on Machine Learning, ICML’20. JMLR.org, 2020.

Summary

  • The paper demonstrates the universal approximation capacity of Mixture-of-Logits for learned similarity functions, advancing retrieval theory.
  • It proposes exact and approximate retrieval algorithms that balance speed and accuracy by narrowing search spaces and leveraging heuristics.
  • Empirical results reveal significant improvements in hit rates and latency on large datasets like MovieLens and Amazon Books.

Efficient Retrieval with Learned Similarities

The paper "Efficient Retrieval with Learned Similarities" authored by Bailu Ding and Jiaqi Zhai, addresses a fundamental challenge in the domain of recommendation systems, search, and natural language processing: efficient retrieval of relevant items from vast datasets. The research pivots on the classical problem of Maximum Inner Product Search (MIPS) and extends towards more advanced learned similarity functions, which have seen increasing adoption in state-of-the-art retrieval algorithms.

Key Insights and Contributions

  1. Learned Similarities and Expressiveness: The paper makes a significant theoretical contribution by demonstrating that the Mixture-of-Logits (MoL) is a universal approximator for learned similarity functions. This theoretically grounds the use of MoL in scenarios involving complex similarity functions that are not easily handled by traditional dot-product-based MIPS.
  2. Retrieval Efficiency: A core aspect of the paper is the development of both exact and approximate retrieval algorithms using MoL. The exact algorithm employs a two-pass retrieval method that narrows down potential candidates before refining the search space. Approximate algorithms leverage heuristics like top-K per embedding and average top-K, optimizing the retrieval process to strike a balance between speed and accuracy.
  3. Empirical Validation: Through rigorous empirical evaluations on three prominent recommendation datasets (MovieLens 1M, MovieLens 20M, and Amazon Books), the paper shows that MoL-based methods significantly outperform traditional dot-product-based retrievals in terms of hit rate and mean reciprocal rank (MRR). Specifically, improvements in HR@1 by 21.4% on average and HR@10 by 13.7% across six different settings highlight the efficacy of MoL.

Empirical Evaluation

Empirical evaluations focus on top-K retrieval performance. Key results show that approximate retrieval methods achieve a hit rate of >99% relative to ground truth while providing substantial latency reductions:

  • MovieLens 20M: TopKAvg achieves >99% relative hit rate at HR@100, with a fourfold speedup in latency.
  • Amazon Books: TopKAvg demonstrates a similar performance, achieving >99% hit rate with a 91× reduction in latency compared to brute-force methods.

Implications

Practical Implications:

  • The proposed retrieval algorithms optimize for large-scale recommendation systems, reducing computational overhead while maintaining high retrieval quality.
  • Adoption in real-time systems, like those deployed in industry, can benefit from the latency improvements, allowing for faster and more efficient user recommendations.

Theoretical Implications:

  • The universal approximator property of MoL suggests that it can be applied across a wide range of retrieval tasks beyond the datasets tested, facilitating further research into diverse applications.
  • Provides a framework for extending retrieval mechanisms in natural language processing and search systems.

Future Developments

The paper leaves room for future work, primarily in the areas of handling even larger datasets and optimizing low-level GPU kernels. More efficient implementations of the two-pass exact retrieval algorithm and exploration of additional optimization techniques tailored to specific hardware accelerators are potential avenues for further enhancing performance.

Conclusion

This paper makes a notable stride in advancing the efficiency of retrieval algorithms using learned similarities. By leveraging MoL as a universal approximator and developing corresponding retrieval algorithms, it sets a new benchmark in the field, particularly in recommendation systems. The demonstrated efficiency and accuracy improvements underline the practical and theoretical potential of the proposed methods, paving the way for further innovations in efficient large-scale retrieval.

HackerNews

  1. Retrieval with Learned Similarities (3 points, 0 comments)