Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LoRANN: Low-Rank Matrix Factorization for Approximate Nearest Neighbor Search (2410.18926v1)

Published 24 Oct 2024 in cs.LG

Abstract: Approximate nearest neighbor (ANN) search is a key component in many modern machine learning pipelines; recent use cases include retrieval-augmented generation (RAG) and vector databases. Clustering-based ANN algorithms, that use score computation methods based on product quantization (PQ), are often used in industrial-scale applications due to their scalability and suitability for distributed and disk-based implementations. However, they have slower query times than the leading graph-based ANN algorithms. In this work, we propose a new supervised score computation method based on the observation that inner product approximation is a multivariate (multi-output) regression problem that can be solved efficiently by reduced-rank regression. Our experiments show that on modern high-dimensional data sets, the proposed reduced-rank regression (RRR) method is superior to PQ in both query latency and memory usage. We also introduce LoRANN, a clustering-based ANN library that leverages the proposed score computation method. LoRANN is competitive with the leading graph-based algorithms and outperforms the state-of-the-art GPU ANN methods on high-dimensional data sets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Cache locality is not enough: high-performance nearest neighbor search with product quantization fast scan. Proceedings of the VLDB Endowment, 9(4):288–299, 2015.
  2. ANN-benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. Information Systems, 87:101374, 2020.
  3. Improving language models by retrieving from trillions of tokens. In Proceedings of the International Conference on Machine Learning, pages 2206–2240. PMLR, 2022.
  4. Sebastian Bruch. Foundations of Vector Retrieval. Springer, 2024.
  5. Optimistic query routing in clustering-based approximate maximum inner product search. arXiv preprint arXiv:2405.12207, 2024.
  6. Moses S. Charikar. Similarity estimation techniques from rounding algorithms. In Proceedings of the 34th Annual ACM Symposium on Theory of Computing, pages 380–388, 2002.
  7. SPANN: Highly-efficient billion-scale approximate nearest neighborhood search. Advances in Neural Information Processing Systems, 34:5199–5212, 2021.
  8. Randomized partition trees for nearest neighbor search. Algorithmica, 72(1):237–263, 2015.
  9. Balanced k-means revisited. Applied Computing and Intelligence, 3(2):145–179, 2023.
  10. Pyramid: A general framework for distributed similarity search on large-scale datasets. In 2019 IEEE International Conference on Big Data (Big Data), pages 1066–1071. IEEE, 2019.
  11. LLM.int8(): 8-bit matrix multiplication for transformers at scale. Advances in Neural Information Processing Systems, 35:30318–30332, 2022.
  12. Concept decompositions for large sparse text data using clustering. Machine learning, 42:143–175, 2001.
  13. Efficient k-nearest neighbor graph construction for generic similarity measures. In Proceedings of the 20th International Conference on World Wide Web, pages 577–586, 2011.
  14. Learning space partitions for nearest neighbor search. In Proceedings of the International Conference on Learning Representations, 2020.
  15. The Faiss library. arXiv preprint arXiv:2401.08281, 2024.
  16. Compiling machine learning programs via high-level tracing. Machine Learning and Systems (MLSys), 2018.
  17. Unleashing graph partitioning for large-scale nearest neighbor search. arXiv preprint arXiv:2403.01797, 2024.
  18. GGNN: Graph-based gpu nearest neighbor search. IEEE Transactions on Big Data, 9(1):267–279, 2022.
  19. Manu: a cloud native vector database management system. Proceedings of the VLDB Endowment, 15(12):3548–3561, 2022.
  20. Accelerating large-scale inference with anisotropic vector quantization. In Proceedings of the International Conference on Machine Learning, pages 3887–3896. PMLR, 2020.
  21. BLISS: A billion scale index using iterative re-partitioning. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 486–495, 2022.
  22. Retrieval augmented language model pre-training. In Proceedings of the International Conference on Machine Learning, pages 3929–3938. PMLR, 2020.
  23. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review, 53(2):217–288, 2011.
  24. Fast nearest neighbor search through sparse random projections and voting. In Proceedings of the 2016 IEEE International Conference on Big Data, pages 881–888. IEEE, 2016.
  25. A multilabel classification framework for approximate nearest neighbor search. Advances in Neural Information Processing Systems, 35:35741–35754, 2022.
  26. Optimization of indexing based on k-nearest neighbor graph for proximity search in high-dimensional data. arXiv preprint arXiv:1810.07355, 2018.
  27. Alan Julian Izenman. Reduced-rank regression for the multivariate linear model. Journal of multivariate analysis, 5(2):248–264, 1975.
  28. Efficient autotuning of hyperparameters in approximate nearest neighbor search. In Proceedings of the 23rd Pacific-Asia Conference on Knowledge Discovery and Data Mining, volume 2, pages 590–602. Springer, 2019.
  29. OOD-DiskANN: Efficient and scalable graph ANNS for out-of-distribution queries. arXiv preprint arXiv:2211.12850, 2022.
  30. DiskANN: Fast accurate billion-point nearest neighbor search on a single node. Advances in Neural Information Processing Systems, 32, 2019.
  31. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1):117–128, 2011.
  32. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(3):535–547, 2019.
  33. Reformer: The efficient transformer. In Proceedings of the International Conference on Learning Representations, 2020.
  34. Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020.
  35. Approximate nearest neighbor search on high dimensional data—experiments, analyses, and improvement. IEEE Transactions on Knowledge and Data Engineering, 32(8):1475–1488, 2019.
  36. Hashing with graphs. In Proceedings of the International Conference on Machine Learning, pages 1–8. PMLR, 2011.
  37. Supervised hashing with kernels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2074–2081. IEEE, 2012.
  38. Knowledge distillation for high dimensional search index. In Advances in Neural Information Processing Systems, volume 36, pages 33403–33419, 2023.
  39. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(4):824–836, 2018.
  40. Scalable nearest neighbor algorithms for high dimensional data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(11):2227–2240, 2014.
  41. Minimal loss hashing for compact binary codes. In Proceedings of the International Conference on Machine Learning, pages 353–360. PMLR, 2011.
  42. CAGRA: Highly parallel graph construction and approximate nearest neighbor search for GPUs. arXiv preprint arXiv:2308.15136, 2023.
  43. Vector database management techniques and systems. In Companion of the 2024 International Conference on Management of Data, pages 597–604, 2024.
  44. HM-ANN: Efficient billion-point nearest neighbor search on heterogeneous memory. Advances in Neural Information Processing Systems, 33:10672–10684, 2020.
  45. Efficient content-based sparse attention with routing transformers. Transactions of the Association for Computational Linguistics, 9:53–68, 2021.
  46. REPLUG: Retrieval-augmented black-box language models. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics, pages 8371–8384. Association for Computational Linguistics, 2024.
  47. Results of the NeurIPS’21 challenge on billion-scale approximate nearest neighbor search. In NeurIPS 2021 Competitions and Demonstrations Track, pages 177–189. PMLR, 2022.
  48. SOAR: Improved indexing for approximate nearest neighbor search. In Advances in Neural Information Processing Systems, volume 36, 2023.
  49. A learning-to-rank formulation of clustering-based approximate nearest neighbor search. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2261–2265, 2024.
  50. Fast transformers with clustered attention. Advances in Neural Information Processing Systems, 33:21665–21674, 2020.
  51. Milvus: A purpose-built vector data management system. In Proceedings of the 2021 International Conference on Management of Data, pages 2614–2627, 2021.
  52. Spectral hashing. Advances in Neural Information Processing Systems, 21:1753–1760, 2008.
  53. SONG: Approximate nearest neighbor search on GPU. In IEEE 36th International Conference on Data Engineering (ICDE), pages 1033–1044. IEEE, 2020.
  54. FARGO: Fast maximum inner product search via global multi-probing. Proceedings of the VLDB Endowment, 16(5):1100–1112, 2023.

Summary

We haven't generated a summary for this paper yet.