Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Practice with Graph-based ANN Algorithms on Sparse Data: Chi-square Two-tower model, HNSW, Sign Cauchy Projections (2306.07607v1)

Published 13 Jun 2023 in cs.IR and stat.ML

Abstract: Sparse data are common. The traditional handcrafted'' features are often sparse. Embedding vectors from trained models can also be very sparse, for example, embeddings trained via theReLu'' activation function. In this paper, we report our exploration of efficient search in sparse data with graph-based ANN algorithms (e.g., HNSW, or SONG which is the GPU version of HNSW), which are popular in industrial practice, e.g., search and ads (advertising). We experiment with the proprietary ads targeting application, as well as benchmark public datasets. For ads targeting, we train embeddings with the standard cosine two-tower'' model and we also develop thechi-square two-tower'' model. Both models produce (highly) sparse embeddings when they are integrated with the ReLu'' activation function. In EBR (embedding-based retrieval) applications, after we the embeddings are trained, the next crucial task is the approximate near neighbor (ANN) search for serving. While there are many ANN algorithms we can choose from, in this study, we focus on the graph-based ANN algorithm (e.g., HNSW-type). Sparse embeddings should help improve the efficiency of EBR. One benefit is the reduced memory cost for the embeddings. The other obvious benefit is the reduced computational time for evaluating similarities, because, for graph-based ANN algorithms such as HNSW, computing similarities is often the dominating cost. In addition to the effort on leveraging data sparsity for storage and computation, we also integratesign cauchy random projections'' (SignCRP) to hash vectors to bits, to further reduce the memory cost and speed up the ANN search. In NIPS'13, SignCRP was proposed to hash the chi-square similarity, which is a well-adopted nonlinear kernel in NLP and computer vision. Therefore, the chi-square two-tower model, SignCRP, and HNSW are now tightly integrated.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. What is an object?. In Proceedings of the Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition (CVPR). San Francisco, CA, 73–80.
  2. Franz Aurenhammer. 1991. Voronoi diagrams—a survey of a fundamental geometric data structure. ACM Computing Surveys (CSUR) 23, 3 (1991), 345–405.
  3. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems (NeurIPS). virtual, 1877–1901.
  4. Pre-training Tasks for Embedding-based Large-scale Retrieval. In Proceedings of the 8th International Conference on Learning Representations (ICLR). Addis Ababa, Ethiopia.
  5. Support vector machines for histogram-based image classification. IEEE Trans. Neural Networks 10, 5 (1999), 1055–1064.
  6. Deep Neural Networks for YouTube Recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems (RecSys). Boston, MA, 191–198.
  7. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). Minneapolis, MN, 4171–4186.
  8. MOBIUS: Towards the Next Generation of Query-Ad Matching in Baidu’s Sponsored Search. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD). Anchorage, AK, 2509–2517.
  9. Fast Approximate Nearest Neighbor Search With The Navigating Spreading-out Graph. Proc. VLDB Endow. 12, 5 (2019), 461–474.
  10. Kunihiko Fukushima. 1975. Cognitron: A self-organizing multilayered neural network. Biological cybernetics 20, 3-4 (1975), 121–136.
  11. Deep Retrieval: Learning A Retrievable Structure for Large-Scale Recommendations. arXiv preprint arXiv:2007.07203 (2020).
  12. DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP. Virtual Event, 879–895.
  13. Deep Sparse Rectifier Neural Networks. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS). Fort Lauderdale, FL, 315–323.
  14. A formal basis for the heuristic determination of minimum cost paths. IEEE transactions on Systems Science and Cybernetics 4, 2 (1968), 100–107.
  15. Matthias Hein and Olivier Bousquet. 2005. Hilbertian Metrics and Positive Definite Kernels on Probability Measures. In Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics (AISTATS). Bridgetown, Barbados.
  16. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM International Conference on Information and Knowledge Management (CIKM). San Francisco, CA, 2333–2338.
  17. Knowledge Graph Embedding Based Question Answering. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining (WSDM). Melbourne, Australia, 105–113.
  18. Towards optimal bag-of-features for object categorization and semantic video retrieval. In Proceedings of the 6th ACM International Conference on Image and Video Retrieval (CIVR). Amsterdam, The Netherlands, 494–501.
  19. Deep Fragment Embeddings for Bidirectional Image Sentence Mapping. In Advances in Neural Information Processing Systems (NIPS). Montreal, Canada, 1889–1897.
  20. General Multi-Label Image Classification With Transformers. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). virtual, 16478–16488.
  21. Sign Cauchy Projections and Chi-Square Kernel. In Advances in Neural Information Processing Systems (NIPS). Lake Tahoe, NV, 2571–2579.
  22. Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies. In Proceedings of the 9th International Conference on Learning Representations, (ICLR). Virtual Event.
  23. Approximate nearest neighbor algorithm based on navigable small world graphs. Information Systems 45 (2014), 61–68.
  24. Yury A. Malkov and Dmitry A. Yashunin. 2020. Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs. IEEE Trans. Pattern Anal. Mach. Intell. 42, 4 (2020), 824–836.
  25. Stanislav Morozov and Artem Babenko. 2018. Non-metric Similarity Graphs for Maximum Inner Product Search. In Advances in Neural Information Processing Systems (NeurIPS). Montreal, Canada, 4726–4735.
  26. Vinod Nair and Geoffrey E. Hinton. 2010. Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the 27th International Conference on Machine Learning (ICML). Haifa, Israel, 807–814.
  27. Text and code embeddings by contrastive pre-training. arXiv preprint arXiv:2201.10005 (2022).
  28. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar, 1532–1543.
  29. Knowledge-aware Recommendations Based on Neuro-Symbolic Graph Embeddings and First-Order Logical Rules. In Proceedings of the Sixteenth ACM Conference on Recommender Systems (RecSys). Seattle, WA, 616–621.
  30. Norm Adjusted Proximity Graph for Fast Inner Product Retrieval. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). Virtual Event, Singapore, 1552–1560.
  31. Fast Neural Ranking on Bipartite Graph Indices. Proc. VLDB Endow. 15, 4 (2021), 794–803.
  32. Fast Item Ranking under Neural Network based Measures. In Proceedings of the Thirteenth ACM International Conference on Web Search and Data Mining (WSDM). Houston, TX, 591–599.
  33. Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58, 1 (1996), 267–288.
  34. Learning and evaluating sparse interpretable sentence embeddings. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Brussels, Belgium, 200–210.
  35. Andrea Vedaldi and Andrew Zisserman. 2012. Efficient Additive Kernels via Explicit Feature Maps. IEEE Trans. Pattern Anal. Mach. Intell. 34, 3 (2012), 480–492.
  36. Generalized RBF feature maps for Efficient Detection. In Proceedings of the British Machine Vision Conference (BMVC). Aberystwyth, UK, 1–11.
  37. Building text features for object image classification. In Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). Miami, FL, 1367–1374.
  38. Proximity Graph Maintenance for Fast Online Nearest Neighbor Search. arXiv preprint arXiv:2206.10839 (2022).
  39. Hierarchical Bilinear Pooling for Fine-Grained Visual Recognition. In Proceedings of the 15th European Conference on Computer Vision (ECCV), Part XVI. Munich, Germany, 595–610.
  40. EGM: Enhanced Graph-based Model for Large-scale Video Advertisement Search. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). Washington, DC, 4443–4451.
  41. SONG: Approximate Nearest Neighbor Search on GPU. In Proceedings of the 36th IEEE International Conference on Data Engineering (ICDE). Dallas, TX, 1033–1044.
  42. Constrained Approximate Similarity Search on Proximity Graph. arXiv preprint arXiv:2210.14958 (2022).
  43. GUITAR: Gradient Pruning toward Fast Neural Ranking. arXiv preprint arXiv (2022).
  44. Möbius Transformation for Fast Inner Product Search on Graph. In Advances in Neural Information Processing Systems (NeurIPS). Vancouver, Canada, 8216–8227.
  45. Joint Optimization of Tree-based Index and Deep Model for Recommender Systems. In Advances in Neural Information Processing Systems (NeurIPS). Vancouver, Canada, 3973–3982.
  46. Learning Tree-based Deep Model for Recommender Systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD). London, UK, 1079–1088.
  47. Learning Optimal Tree Models under Beam Search. In Proceedings of the 37th International Conference on Machine Learning (ICML). Virtual Event, 11650–11659.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Ping Li (421 papers)
  2. Weijie Zhao (44 papers)
  3. Chao Wang (555 papers)
  4. Qi Xia (9 papers)
  5. Alice Wu (7 papers)
  6. Lijun Peng (3 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.