LoRANN: Low-Rank Matrix Factorization for Approximate Nearest Neighbor Search (2410.18926v1)
Abstract: Approximate nearest neighbor (ANN) search is a key component in many modern machine learning pipelines; recent use cases include retrieval-augmented generation (RAG) and vector databases. Clustering-based ANN algorithms, that use score computation methods based on product quantization (PQ), are often used in industrial-scale applications due to their scalability and suitability for distributed and disk-based implementations. However, they have slower query times than the leading graph-based ANN algorithms. In this work, we propose a new supervised score computation method based on the observation that inner product approximation is a multivariate (multi-output) regression problem that can be solved efficiently by reduced-rank regression. Our experiments show that on modern high-dimensional data sets, the proposed reduced-rank regression (RRR) method is superior to PQ in both query latency and memory usage. We also introduce LoRANN, a clustering-based ANN library that leverages the proposed score computation method. LoRANN is competitive with the leading graph-based algorithms and outperforms the state-of-the-art GPU ANN methods on high-dimensional data sets.
- Cache locality is not enough: high-performance nearest neighbor search with product quantization fast scan. Proceedings of the VLDB Endowment, 9(4):288–299, 2015.
- ANN-benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. Information Systems, 87:101374, 2020.
- Improving language models by retrieving from trillions of tokens. In Proceedings of the International Conference on Machine Learning, pages 2206–2240. PMLR, 2022.
- Sebastian Bruch. Foundations of Vector Retrieval. Springer, 2024.
- Optimistic query routing in clustering-based approximate maximum inner product search. arXiv preprint arXiv:2405.12207, 2024.
- Moses S. Charikar. Similarity estimation techniques from rounding algorithms. In Proceedings of the 34th Annual ACM Symposium on Theory of Computing, pages 380–388, 2002.
- SPANN: Highly-efficient billion-scale approximate nearest neighborhood search. Advances in Neural Information Processing Systems, 34:5199–5212, 2021.
- Randomized partition trees for nearest neighbor search. Algorithmica, 72(1):237–263, 2015.
- Balanced k-means revisited. Applied Computing and Intelligence, 3(2):145–179, 2023.
- Pyramid: A general framework for distributed similarity search on large-scale datasets. In 2019 IEEE International Conference on Big Data (Big Data), pages 1066–1071. IEEE, 2019.
- LLM.int8(): 8-bit matrix multiplication for transformers at scale. Advances in Neural Information Processing Systems, 35:30318–30332, 2022.
- Concept decompositions for large sparse text data using clustering. Machine learning, 42:143–175, 2001.
- Efficient k-nearest neighbor graph construction for generic similarity measures. In Proceedings of the 20th International Conference on World Wide Web, pages 577–586, 2011.
- Learning space partitions for nearest neighbor search. In Proceedings of the International Conference on Learning Representations, 2020.
- The Faiss library. arXiv preprint arXiv:2401.08281, 2024.
- Compiling machine learning programs via high-level tracing. Machine Learning and Systems (MLSys), 2018.
- Unleashing graph partitioning for large-scale nearest neighbor search. arXiv preprint arXiv:2403.01797, 2024.
- GGNN: Graph-based gpu nearest neighbor search. IEEE Transactions on Big Data, 9(1):267–279, 2022.
- Manu: a cloud native vector database management system. Proceedings of the VLDB Endowment, 15(12):3548–3561, 2022.
- Accelerating large-scale inference with anisotropic vector quantization. In Proceedings of the International Conference on Machine Learning, pages 3887–3896. PMLR, 2020.
- BLISS: A billion scale index using iterative re-partitioning. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 486–495, 2022.
- Retrieval augmented language model pre-training. In Proceedings of the International Conference on Machine Learning, pages 3929–3938. PMLR, 2020.
- Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review, 53(2):217–288, 2011.
- Fast nearest neighbor search through sparse random projections and voting. In Proceedings of the 2016 IEEE International Conference on Big Data, pages 881–888. IEEE, 2016.
- A multilabel classification framework for approximate nearest neighbor search. Advances in Neural Information Processing Systems, 35:35741–35754, 2022.
- Optimization of indexing based on k-nearest neighbor graph for proximity search in high-dimensional data. arXiv preprint arXiv:1810.07355, 2018.
- Alan Julian Izenman. Reduced-rank regression for the multivariate linear model. Journal of multivariate analysis, 5(2):248–264, 1975.
- Efficient autotuning of hyperparameters in approximate nearest neighbor search. In Proceedings of the 23rd Pacific-Asia Conference on Knowledge Discovery and Data Mining, volume 2, pages 590–602. Springer, 2019.
- OOD-DiskANN: Efficient and scalable graph ANNS for out-of-distribution queries. arXiv preprint arXiv:2211.12850, 2022.
- DiskANN: Fast accurate billion-point nearest neighbor search on a single node. Advances in Neural Information Processing Systems, 32, 2019.
- Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1):117–128, 2011.
- Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(3):535–547, 2019.
- Reformer: The efficient transformer. In Proceedings of the International Conference on Learning Representations, 2020.
- Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020.
- Approximate nearest neighbor search on high dimensional data—experiments, analyses, and improvement. IEEE Transactions on Knowledge and Data Engineering, 32(8):1475–1488, 2019.
- Hashing with graphs. In Proceedings of the International Conference on Machine Learning, pages 1–8. PMLR, 2011.
- Supervised hashing with kernels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2074–2081. IEEE, 2012.
- Knowledge distillation for high dimensional search index. In Advances in Neural Information Processing Systems, volume 36, pages 33403–33419, 2023.
- Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(4):824–836, 2018.
- Scalable nearest neighbor algorithms for high dimensional data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(11):2227–2240, 2014.
- Minimal loss hashing for compact binary codes. In Proceedings of the International Conference on Machine Learning, pages 353–360. PMLR, 2011.
- CAGRA: Highly parallel graph construction and approximate nearest neighbor search for GPUs. arXiv preprint arXiv:2308.15136, 2023.
- Vector database management techniques and systems. In Companion of the 2024 International Conference on Management of Data, pages 597–604, 2024.
- HM-ANN: Efficient billion-point nearest neighbor search on heterogeneous memory. Advances in Neural Information Processing Systems, 33:10672–10684, 2020.
- Efficient content-based sparse attention with routing transformers. Transactions of the Association for Computational Linguistics, 9:53–68, 2021.
- REPLUG: Retrieval-augmented black-box language models. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics, pages 8371–8384. Association for Computational Linguistics, 2024.
- Results of the NeurIPS’21 challenge on billion-scale approximate nearest neighbor search. In NeurIPS 2021 Competitions and Demonstrations Track, pages 177–189. PMLR, 2022.
- SOAR: Improved indexing for approximate nearest neighbor search. In Advances in Neural Information Processing Systems, volume 36, 2023.
- A learning-to-rank formulation of clustering-based approximate nearest neighbor search. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2261–2265, 2024.
- Fast transformers with clustered attention. Advances in Neural Information Processing Systems, 33:21665–21674, 2020.
- Milvus: A purpose-built vector data management system. In Proceedings of the 2021 International Conference on Management of Data, pages 2614–2627, 2021.
- Spectral hashing. Advances in Neural Information Processing Systems, 21:1753–1760, 2008.
- SONG: Approximate nearest neighbor search on GPU. In IEEE 36th International Conference on Data Engineering (ICDE), pages 1033–1044. IEEE, 2020.
- FARGO: Fast maximum inner product search via global multi-probing. Proceedings of the VLDB Endowment, 16(5):1100–1112, 2023.