Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Enhancing HNSW Index for Real-Time Updates: Addressing Unreachable Points and Performance Degradation (2407.07871v2)

Published 10 Jul 2024 in cs.IR

Abstract: The approximate nearest neighbor search (ANNS) is a fundamental and essential component in data mining and information retrieval, with graph-based methodologies demonstrating superior performance compared to alternative approaches. Extensive research efforts have been dedicated to improving search efficiency by developing various graph-based indices, such as HNSW (Hierarchical Navigable Small World). However, the performance of HNSW and most graph-based indices become unacceptable when faced with a large number of real-time deletions, insertions, and updates. Furthermore, during update operations, HNSW can result in some data points becoming unreachable, a situation we refer to as the `unreachable points phenomenon'. This phenomenon could significantly affect the search accuracy of the graph in certain situations. To address these issues, we present efficient measures to overcome the shortcomings of HNSW, specifically addressing poor performance over long periods of delete and update operations and resolving the issues caused by the unreachable points phenomenon. Our proposed MN-RU algorithm effectively improves update efficiency and suppresses the growth rate of unreachable points, ensuring better overall performance and maintaining the integrity of the graph. Our results demonstrate that our methods outperform existing approaches. Furthermore, since our methods are based on HNSW, they can be easily integrated with existing indices widely used in the industrial field, making them practical for future real-world applications. Code is available at \url{https://github.com/xwt1/MN-RU.git}

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. M. Wang, X. Xu, Q. Yue, and Y. Wang, “A comprehensive survey and experimental comparison of graph-based approximate nearest neighbor search,” Proc. VLDB Endow., vol. 14, no. 11, p. 1964–1978, jul 2021. [Online]. Available: https://doi.org/10.14778/3476249.3476255
  2. R. Chen, B. Liu, H. Zhu, Y. Wang, Q. Li, B. Ma, Q. Hua, J. Jiang, Y. Xu, H. Deng, and B. Zheng, “Approximate nearest neighbor search under neural similarity metric for large-scale recommendation,” in Proceedings of the 31st ACM International Conference on Information & Knowledge Management, ser. CIKM ’22.   New York, NY, USA: Association for Computing Machinery, 2022, p. 3013–3022. [Online]. Available: https://doi.org/10.1145/3511808.3557098
  3. P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel, S. Riedel, and D. Kiela, “Retrieval-augmented generation for knowledge-intensive nlp tasks,” in Proceedings of the 34th International Conference on Neural Information Processing Systems, ser. NIPS ’20.   Red Hook, NY, USA: Curran Associates Inc., 2020.
  4. M. Muja and D. G. Lowe, “Scalable nearest neighbor algorithms for high dimensional data,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, pp. 2227–2240, 2014. [Online]. Available: https://api.semanticscholar.org/CorpusID:206765442
  5. Q.-Y. Jiang and W.-J. Li, “Scalable graph hashing with feature transformation.” in IJCAI, vol. 15, 2015, pp. 2248–2254.
  6. J. Johnson, M. Douze, and H. Jégou, “Billion-scale similarity search with GPUs,” IEEE Transactions on Big Data, vol. 7, no. 3, pp. 535–547, 2019.
  7. C. Fu, C. Xiang, C. Wang, and D. Cai, “Fast approximate nearest neighbor search with the navigating spreading-out graph,” Proc. VLDB Endow., vol. 12, no. 5, p. 461–474, jan 2019. [Online]. Available: https://doi.org/10.14778/3303753.3303754
  8. K. Lu, M. Kudo, C. Xiao, and Y. Ishikawa, “Hvs: hierarchical graph structure based on voronoi diagrams for solving approximate nearest neighbor search,” Proc. VLDB Endow., vol. 15, no. 2, p. 246–258, oct 2021. [Online]. Available: https://doi.org/10.14778/3489496.3489506
  9. S. Gollapudi, N. Karia, V. Sivashankar, R. Krishnaswamy, N. Begwani, S. Raz, Y. Lin, Y. Zhang, N. Mahapatro, P. Srinivasan, A. Singh, and H. V. Simhadri, “Filtered-diskann: Graph algorithms for approximate nearest neighbor search with filters,” in Proceedings of the ACM Web Conference 2023, ser. WWW ’23.   New York, NY, USA: Association for Computing Machinery, 2023, p. 3406–3416. [Online]. Available: https://doi.org/10.1145/3543507.3583552
  10. M. Wang, W. Xu, X. Yi, S. Wu, Z. Peng, X. Ke, Y. Gao, X. Xu, R. Guo, and C. Xie, “Starling: An i/o-efficient disk-resident graph index framework for high-dimensional vector similarity search on data segment,” Proc. ACM Manag. Data, vol. 2, no. 1, mar 2024. [Online]. Available: https://doi.org/10.1145/3639269
  11. M. Aumüller, E. Bernhardsson, and A. Faithfull, “Ann-benchmarks: A benchmarking tool for approximate nearest neighbor algorithms,” Information Systems, vol. 87, p. 101374, 2020. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0306437918303685
  12. Y. A. Malkov and D. A. Yashunin, “Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 4, pp. 824–836, 2020.
  13. L. Chen, Y. Gao, X. Li, C. S. Jensen, and G. Chen, “Efficient metric indexing for similarity search,” in 2015 IEEE 31st International Conference on Data Engineering, 2015, pp. 591–602.
  14. H. V. Jagadish, B. C. Ooi, K.-L. Tan, C. Yu, and R. Zhang, “idistance: An adaptive b+-tree based indexing method for nearest neighbor search,” ACM Transactions on Database Systems (TODS), vol. 30, no. 2, pp. 364–397, 2005.
  15. A. Gionis, P. Indyk, R. Motwani et al., “Similarity search in high dimensions via hashing,” in Vldb, vol. 99, no. 6, 1999, pp. 518–529.
  16. M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni, “Locality-sensitive hashing scheme based on p-stable distributions,” in Proceedings of the twentieth annual symposium on Computational geometry, 2004, pp. 253–262.
  17. K. Lu, H. Wang, W. Wang, and M. Kudo, “Vhp: approximate nearest neighbor search via virtual hypersphere partitioning,” Proceedings of the VLDB Endowment, vol. 13, no. 9, pp. 1443–1455, 2020.
  18. Y. Lei, Q. Huang, M. Kankanhalli, and A. K. Tung, “Locality-sensitive hashing scheme based on longest circular co-substring,” in Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, 2020, pp. 2589–2599.
  19. B. Zheng, Z. Xi, L. Weng, N. Q. V. Hung, H. Liu, and C. S. Jensen, “Pm-lsh: A fast and accurate lsh framework for high-dimensional approximate nn search,” Proceedings of the VLDB Endowment, vol. 13, no. 5, pp. 643–655, 2020.
  20. K. Lu and M. Kudo, “R2lsh: A nearest neighbor search scheme based on two-dimensional projected spaces,” in 2020 IEEE 36th International Conference on Data Engineering (ICDE).   IEEE, 2020, pp. 1045–1056.
  21. A. Babenko and V. Lempitsky, “Additive quantization for extreme vector compression,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 931–938.
  22. ——, “Tree quantization for large-scale similarity search and classification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 4240–4248.
  23. W. Li, Y. Zhang, Y. Sun, W. Wang, M. Li, W. Zhang, and X. Lin, “Approximate nearest neighbor search on high dimensional data—experiments, analyses, and improvement,” IEEE Transactions on Knowledge and Data Engineering, vol. 32, no. 8, pp. 1475–1488, 2019.
  24. N. Lee, J. Lee, and C. Park, “Augmentation-free self-supervised learning on graphs,” in Proceedings of the AAAI conference on artificial intelligence, vol. 36, no. 7, 2022, pp. 7372–7380.
  25. R. S. Oyamada, L. C. Shimomura, S. Barbon Jr, and D. S. Kaster, “A meta-learning configuration framework for graph-based similarity search indexes,” Information Systems, vol. 112, p. 102123, 2023.
  26. F. Groh, L. Ruppert, P. Wieschollek, and H. P. Lensch, “Ggnn: Graph-based gpu nearest neighbor search,” IEEE Transactions on Big Data, vol. 9, no. 1, pp. 267–279, 2022.
  27. A. Singh, S. J. Subramanya, R. Krishnaswamy, and H. V. Simhadri, “Freshdiskann: A fast and accurate graph-based ANN index for streaming similarity search,” CoRR, vol. abs/2105.09613, 2021. [Online]. Available: https://arxiv.org/abs/2105.09613
  28. Q. Chen, B. Zhao, H. Wang, M. Li, C. Liu, Z. Li, M. Yang, and J. Wang, “Spann: Highly-efficient billion-scale approximate nearest neighborhood search,” in Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, Eds., vol. 34.   Curran Associates, Inc., 2021, pp. 5199–5212. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2021/file/299dc35e747eb77177d9cea10a802da2-Paper.pdf
  29. Y. Xu, H. Liang, J. Li, S. Xu, Q. Chen, Q. Zhang, C. Li, Z. Yang, F. Yang, Y. Yang, P. Cheng, and M. Yang, “Spfresh: Incremental in-place update for billion-scale vector search,” in Proceedings of the 29th Symposium on Operating Systems Principles, ser. SOSP ’23.   New York, NY, USA: Association for Computing Machinery, 2023, p. 545–561. [Online]. Available: https://doi.org/10.1145/3600006.3613166
  30. J. Li, X. Yan, J. Zhang, A. Xu, J. Cheng, J. Liu, K. K. Ng, and T.-c. Cheng, “A general and efficient querying method for learning to hash,” in Proceedings of the 2018 International Conference on Management of Data, 2018, pp. 1333–1347.
  31. L. Amsaleg and H. Jegou, “Datasets for approximate nearest neighbor search,” http://corpus-texmex.irisa.fr/, 2010.
  32. C. Fu, C. Wang, and D. Cai, “High dimensional similarity search with satellite system graph: Efficiency, scalability, and unindexed query compatibility,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 8, pp. 4139–4150, 2022.

Summary

We haven't generated a summary for this paper yet.