Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
121 tokens/sec
GPT-4o
9 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Characterizing the Dilemma of Performance and Index Size in Billion-Scale Vector Search and Breaking It with Second-Tier Memory (2405.03267v2)

Published 6 May 2024 in cs.DC, cs.DB, and cs.IR

Abstract: Vector searches on large-scale datasets are critical to modern online services like web search and RAG, which necessity storing the datasets and their index on the secondary storage like SSD. In this paper, we are the first to characterize the trade-off of performance and index size in existing SSD-based graph and cluster indexes: to improve throughput by 5.7$\times$ and 1.7$\times$, these indexes have to pay a 5.8$\times$ storage amplification and 7.7$\times$ with respect to the dataset size, respectively. The root cause is that the coarse-grained access of SSD mismatches the fine-grained random read required by vector indexes with small amplification. This paper argues that second-tier memory, such as remote DRAM/NVM connected via RDMA or CXL, is a powerful storage for addressing the problem from a system's perspective, thanks to its fine-grained access granularity. However, putting existing indexes -- primarily designed for SSD -- directly on second-tier memory cannot fully utilize its power. Meanwhile, second-tier memory still behaves more like storage, so using it as DRAM is also inefficient. To this end, we build a graph and cluster index that centers around the performance features of second-tier memory. With careful execution engine and index layout designs, we show that vector indexes can achieve optimal performance with orders of magnitude smaller index amplification, on a variety of second-tier memory devices. Based on our improved graph and vector indexes on second-tier memory, we further conduct a systematic study between them to facilitate developers choosing the right index for their workloads. Interestingly, the findings on the second-tier memory contradict the ones on SSDs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (62)
  1. Algorithms, A. A. KGraph: A library for fast approximate nearest neighbor search, 2014.
  2. Apache. Solr. https://solr.apache.org/, 2024.
  3. Approximate nearest neighbor queries in fixed dimensions. In Proceedings of the Fourth Annual ACM/SIGACT-SIAM Symposium on Discrete Algorithms, 25-27 January 1993, Austin, Texas, USA (1993), V. Ramachandran, Ed., ACM/SIAM, pp. 271–280.
  4. Web search for a planet: The google cluster architecture. IEEE Micro 23, 2 (2003), 22–28.
  5. Bentley, J. L. Multidimensional binary search trees used for associative searching. Commun. ACM 18, 9 (1975), 509–517.
  6. SPTAG: A library for fast approximate nearest neighbor search, 2018.
  7. SPANN: highly-efficient billion-scale approximate nearest neighborhood search. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual (2021), M. Ranzato, A. Beygelzimer, Y. N. Dauphin, P. Liang, and J. W. Vaughan, Eds., pp. 5199–5212.
  8. Clarkson, K. L. An algorithm for approximate closest-point queries. In Proceedings of the Tenth Annual Symposium on Computational Geometry, Stony Brook, New York, USA, June 6-8, 1994 (1994), K. Mehlhorn, Ed., ACM, pp. 160–164.
  9. cloud, G. Find anything blazingly fast with google’s vector search technology. https://cloud.google.com/blog/topics/developers-practitioners/find-anything-blazingly-fast-googles-vector-search-technology, 2024.
  10. Consortium, C. Cxl specification. https://www.computeexpresslink.org/download-the-specification, 2024.
  11. The tail at scale. Commun. ACM 56, 2 (2013), 74–80.
  12. Farm: Fast remote memory. In Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2014, Seattle, WA, USA, April 2-4, 2014 (2014), R. Mahajan and I. Stoica, Eds., USENIX Association, pp. 401–414.
  13. Elastic. Elasticsearch. https://www.elastic.co, 2024.
  14. facebook. Faiss. https://github.com/facebookresearch/faiss, 2023.
  15. Fast approximate nearest neighbor search with the navigating spreading-out graph. Proc. VLDB Endow. 12, 5 (jan 2019), 461–474.
  16. Network requirements for resource disaggregation. In 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, Savannah, GA, USA, November 2-4, 2016 (2016), K. Keeton and T. Roscoe, Eds., USENIX Association, pp. 249–264.
  17. Similarity search in high dimensions via hashing. In VLDB’99, Proceedings of 25th International Conference on Very Large Data Bases, September 7-10, 1999, Edinburgh, Scotland, UK (1999), M. P. Atkinson, M. E. Orlowska, P. Valduriez, S. B. Zdonik, and M. L. Brodie, Eds., Morgan Kaufmann, pp. 518–529.
  18. Direct access, high-performance memory disaggregation with directcxl. In 2022 USENIX Annual Technical Conference, USENIX ATC 2022, Carlsbad, CA, USA, July 11-13, 2022 (2022), J. Schindler and N. Zilberman, Eds., USENIX Association, pp. 287–294.
  19. Efficient memory disaggregation with infiniswap. In 14th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2017, Boston, MA, USA, March 27-29, 2017 (2017), A. Akella and J. Howell, Eds., USENIX Association, pp. 649–667.
  20. Guttman, A. R-trees: A dynamic index structure for spatial searching. In SIGMOD’84, Proceedings of Annual Meeting, Boston, Massachusetts, USA, June 18-21, 1984 (1984), B. Yormark, Ed., ACM Press, pp. 47–57.
  21. Query-aware locality-sensitive hashing for approximate nearest neighbor search. Proc. VLDB Endow. 9, 1 (2015), 1–12.
  22. Basic performance measurements of the intel optane DC persistent memory module. CoRR abs/1903.05714 (2019).
  23. CXL-ANNS: software-hardware collaborative memory disaggregation and computation for billion-scale approximate nearest neighbor search. In 2023 USENIX Annual Technical Conference, USENIX ATC 2023, Boston, MA, USA, July 10-12, 2023 (2023), J. Lawall and D. Williams, Eds., USENIX Association, pp. 585–600.
  24. John Forrest, R. L.-H. Cbc user guide. https://www.coin-or.org/Cbc/cbcuserguide.html, 2024.
  25. Billion-scale similarity search with gpus. IEEE Trans. Big Data 7, 3 (2021), 535–547.
  26. We ain’t afraid of no file fragmentation: Causes and prevention of its performance impact on modern flash ssds. In 22nd USENIX Conference on File and Storage Technologies, FAST 2024, Santa Clara, CA, USA, February 27-29, 2024 (2024), X. Ma and Y. Won, Eds., USENIX Association, pp. 193–208.
  27. Design guidelines for high performance RDMA systems. login Usenix Mag. 41, 3 (2016).
  28. Fasst: Fast, scalable and simple distributed transactions with two-sided (RDMA) datagram rpcs. In 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, Savannah, GA, USA, November 2-4, 2016 (2016), K. Keeton and T. Roscoe, Eds., USENIX Association, pp. 185–201.
  29. Landrum, B. Ivf2 index: Fusing classic and spatial inverted indices for fast filtered anns. https://big-ann-benchmarks.com/neurips23_slides/IVF_2_filter_Ben.pdf, 2024.
  30. Retrieval-augmented generation for knowledge-intensive NLP tasks. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual (2020), H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds.
  31. Pond: Cxl-based memory pooling systems for cloud platforms. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, ASPLOS 2023, Vancouver, BC, Canada, March 25-29, 2023 (2023), T. M. Aamodt, N. D. E. Jerger, and M. M. Swift, Eds., ACM, pp. 574–587.
  32. Embedding-based product retrieval in taobao search. In KDD ’21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, Singapore, August 14-18, 2021 (2021), F. Zhu, B. C. Ooi, and C. Miao, Eds., ACM, pp. 3181–3189.
  33. Approximate nearest neighbor search on high dimensional data — experiments, analyses, and improvement. IEEE Transactions on Knowledge and Data Engineering 32, 8 (2020), 1475–1488.
  34. Fast clustering with flexible balance constraints. In IEEE International Conference on Big Data (IEEE BigData 2018), Seattle, WA, USA, December 10-13, 2018 (2018), N. Abe, H. Liu, C. Pu, X. Hu, N. K. Ahmed, M. Qiao, Y. Song, D. Kossmann, B. Liu, K. Lee, J. Tang, J. He, and J. S. Saltz, Eds., IEEE, pp. 743–750.
  35. New algorithms for efficient high-dimensional nonparametric classification. J. Mach. Learn. Res. 7 (2006), 1135–1158.
  36. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE transactions on pattern analysis and machine intelligence 42, 4 (2018), 824–836.
  37. TPP: transparent page placement for cxl-enabled tiered-memory. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3, ASPLOS 2023, Vancouver, BC, Canada, March 25-29, 2023 (2023), T. M. Aamodt, N. D. E. Jerger, and M. M. Swift, Eds., ACM, pp. 742–755.
  38. Efficient estimation of word representations in vector space. In 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings (2013), Y. Bengio and Y. LeCun, Eds.
  39. CAGRA: highly parallel graph construction and approximate nearest neighbor search for gpus. CoRR abs/2308.15136 (2023).
  40. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL (2014), A. Moschitti, B. Pang, and W. Daelemans, Eds., ACL, pp. 1532–1543.
  41. Pinecone. Pinecone. https://www.pinecone.io, 2024.
  42. HM-ANN: efficient billion-point nearest neighbor search on heterogeneous memory. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual (2020), H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds.
  43. Saad, Y. Iterative methods for sparse linear systems. SIAM, 2003.
  44. Shewale, R. Google search statistics 2024 (no. of searches per day). https://www.demandsage.com/google-search-statistics/, 2024.
  45. SPDK. Build ultra high-performance storage applications with the storage performance development kit. https://spdk.io, 2024.
  46. Spotify. Annoy. https://github.com/spotify/annoy, 2024.
  47. DiskANN: Fast Accurate Billion-Point Nearest Neighbor Search on a Single Node. Curran Associates Inc., Red Hook, NY, USA, 2019.
  48. Milvus: A purpose-built vector data management system. In Proceedings of the 2021 International Conference on Management of Data (New York, NY, USA, 2021), SIGMOD ’21, Association for Computing Machinery, p. 2614–2627.
  49. Starling: An i/o-efficient disk-resident graph index framework for high-dimensional vector similarity search on data segment. CoRR abs/2401.02116 (2024).
  50. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In VLDB’98, Proceedings of 24rd International Conference on Very Large Data Bases, August 24-27, 1998, New York City, New York, USA (1998), A. Gupta, O. Shmueli, and J. Widom, Eds., Morgan Kaufmann, pp. 194–205.
  51. Analyticdb-v: A hybrid analytical engine towards query fusion for structured and unstructured data. Proc. VLDB Endow. 13, 12 (aug 2020), 3152–3165.
  52. Deconstructing RDMA-enabled distributed transactions: Hybrid is better! In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18) (Carlsbad, CA, Oct. 2018), USENIX Association, pp. 233–251.
  53. Characterizing and optimizing remote persistent memory with RDMA and NVM. In 2021 USENIX Annual Technical Conference, USENIX ATC 2021, July 14-16, 2021 (2021), I. Calciu and G. Kuenning, Eds., USENIX Association, pp. 523–536.
  54. Spectral hashing. In Advances in Neural Information Processing Systems 21, Proceedings of the Twenty-Second Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 8-11, 2008 (2008), D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, Eds., Curran Associates, Inc., pp. 1753–1760.
  55. Spfresh: Incremental in-place update for billion-scale vector search. In Proceedings of the 29th Symposium on Operating Systems Principles, SOSP 2023, Koblenz, Germany, October 23-26, 2023 (2023), J. Flinn, M. I. Seltzer, P. Druschel, A. Kaufmann, and J. Mace, Eds., ACM, pp. 545–561.
  56. An empirical guide to the behavior and use of scalable persistent memory. In 18th USENIX Conference on File and Storage Technologies, FAST 2020, Santa Clara, CA, USA, February 24-27, 2020 (2020), S. H. Noh and B. Welch, Eds., USENIX Association, pp. 169–182.
  57. PASE: postgresql ultra-high-dimensional approximate nearest neighbor search extension. In Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14-19, 2020 (2020), D. Maier, R. Pottinger, A. Doan, W. Tan, A. Alawini, and H. Q. Ngo, Eds., ACM, pp. 2241–2253.
  58. Reducing the storage overhead of main-memory OLTP databases with hybrid indexes. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016 (2016), F. Özcan, G. Koutrika, and S. Madden, Eds., ACM, pp. 1567–1581.
  59. VBASE: Unifying online vector similarity search and relational queries via relaxed monotonicity. In 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23) (Boston, MA, July 2023), USENIX Association, pp. 377–395.
  60. Towards cost-effective and elastic cloud database deployment via memory disaggregation. Proc. VLDB Endow. 14, 10 (2021), 1900–1912.
  61. Fast vector query processing for large datasets beyond GPU memory with reordered pipelining. In 21st USENIX Symposium on Networked Systems Design and Implementation, NSDI 2024, Santa Clara, CA, April 15-17, 2024 (2024), L. Vanbever and I. Zhang, Eds., USENIX Association, pp. 23–40.
  62. One-sided rdma-conscious extendible hashing for disaggregated memory. In 2021 USENIX Annual Technical Conference (USENIX ATC 21) (July 2021), USENIX Association, pp. 15–29.
Citations (2)

Summary

We haven't generated a summary for this paper yet.