SAH: Shifting-aware Asymmetric Hashing for Reverse $k$-Maximum Inner Product Search (2211.12751v1)
Abstract: This paper investigates a new yet challenging problem called Reverse $k$-Maximum Inner Product Search (R$k$MIPS). Given a query (item) vector, a set of item vectors, and a set of user vectors, the problem of R$k$MIPS aims to find a set of user vectors whose inner products with the query vector are one of the $k$ largest among the query and item vectors. We propose the first subquadratic-time algorithm, i.e., Shifting-aware Asymmetric Hashing (SAH), to tackle the R$k$MIPS problem. To speed up the Maximum Inner Product Search (MIPS) on item vectors, we design a shifting-invariant asymmetric transformation and develop a novel sublinear-time Shifting-Aware Asymmetric Locality Sensitive Hashing (SA-ALSH) scheme. Furthermore, we devise a new blocking strategy based on the Cone-Tree to effectively prune user vectors (in a batch). We prove that SAH achieves a theoretical guarantee for solving the RMIPS problem. Experimental results on five real-world datasets show that SAH runs 4$\sim$8$\times$ faster than the state-of-the-art methods for R$k$MIPS while achieving F1-scores of over 90\%. The code is available at \url{https://github.com/HuangQiang/SAH}.
- To Index or Not to Index: Optimizing Exact Maximum Inner Product Search. In 2019 IEEE 35th International Conference on Data Engineering (ICDE), 1250–1261.
- Efficient reverse k-nearest neighbor search in arbitrary metric spaces. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data (SIGMOD), 515–526.
- Reverse Maximum Inner Product Search: How to efficiently find users who would like to buy my item? In The Fifteenth ACM Conference on Recommender Systems (RecSys), 273–281.
- Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS), 459–468.
- Practical and Optimal LSH for Angular Distance. In Advances in Neural Information Processing Systems 28 (NIPS), 1225–1233.
- Reverse Nearest Neighbors Search in High Dimensions using Locality-Sensitive Hashing. arXiv:1011.4955.
- Diamond Sampling for Approximate Maximum All-Pairs Dot-Product (MAD) Search. In 2015 IEEE International Conference on Data Mining (ICDM), 11–20.
- The Netflix Prize. In Proceedings of KDD Cup and Workshop 2007 (KDDCup), 3–6.
- Charikar, M. 2002. Similarity estimation techniques from rounding algorithms. In Proceedings of the 34th Annual ACM Symposium on Theory of Computing (STOC), 380–388.
- Norm-Explicit Quantization: Improving Vector Quantization for Maximum Inner Product Search. In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI), 51–58.
- Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the 20th ACM Symposium on Computational Geometry (SCG), 253–262.
- A Fast Sampling Algorithm for Maximum Inner Product Search. In Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics (AISTATS), 3004–3012.
- Locality-sensitive hashing scheme based on dynamic collision counting. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD), 541–552.
- Quantization based Fast Inner Product Search. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS), 482–490.
- Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality. Theory Comput., 8: 321–350.
- Query-aware locality-sensitive hashing for approximate nearest neighbor search. Proc. VLDB Endow., 9(1): 1–12.
- Accurate and Fast Asymmetric Locality-Sensitive Hashing Scheme for Maximum Inner Product Search. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD), 1561–1570.
- Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing (STOC), 604–613.
- Improved maximum inner product search with better theoretical guarantee using randomized partition trees. Mach. Learn., 107(6): 1069–1094.
- Efficient retrieval of recommendations in a matrix factorization framework. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM), 535–544.
- Matrix Factorization Techniques for Recommender Systems. Computer, 42(8): 30–37.
- Influence Sets Based on Reverse Nearest Neighbor Queries. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD), 201–212.
- Locality-sensitive hashing scheme based on longest circular co-substring. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (SIGMOD), 2589–2599.
- Sublinear Time Nearest Neighbor Search over Generalized Weighted Space. In Proceedings of the 36th International Conference on Machine Learning (ICML), 3773–3781.
- FEXIPRO: Fast and Exact Inner Product Retrieval in Recommender Systems. In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD), 835–850.
- Understanding and Improving Proximity Graph Based Maximum Inner Product Search. In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI), 139–146.
- Revisiting Wedge Sampling for Budgeted Maximum Inner Product Search. In Machine Learning and Knowledge Discovery in Databases - European Conference (ECML-PKDD) 2020, Ghent, Belgium, September 14-18, 2020, Proceedings, Part I, 439–455.
- Non-metric Similarity Graphs for Maximum Inner Product Search. In Advances in Neural Information Processing Systems 31 (NeurIPS), 4726–4735.
- On Symmetric and Asymmetric LSHs for Inner Product Search. In Proceedings of the 32nd International Conference on Machine Learning (ICML), 1926–1934.
- Pham, N. 2021. Simple Yet Efficient Algorithms for Maximum Inner Product Search via Extreme Order Statistics. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD), 1339–1347.
- Maximum inner-product search using cone trees. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 931–939.
- Learning Binary Codes for Maximum Inner Product Search. In 2015 IEEE International Conference on Computer Vision (ICCV), 4148–4156.
- Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS). In Advances in Neural Information Processing Systems 27 (NIPS), 2321–2329.
- Improved Asymmetric Locality Sensitive Hashing (ALSH) for Maximum Inner Product Search (MIPS). In Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence (UAI), 812–821.
- High dimensional reverse nearest neighbor queries. In Proceedings of the Twelfth International Conference on Information and Knowledge Management (CIKM), 91–98.
- Norm Adjusted Proximity Graph for Fast Inner Product Retrieval. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD), 1552–1560.
- On Efficient Retrieval of Top Similarity Vectors. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 5235–5245.
- Reverse kNN Search in Arbitrary Dimensionality. In Proceedings of the Thirtieth International Conference on Very Large Data Bases (VLDB), 744–755.
- Quality and efficiency in high dimensional nearest neighbor search. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data (SIGMOD), 563–576.
- Exact and Approximate Maximum Inner Product Search with LEMP. ACM Trans. Database Syst., 42(1): 5:1–5:49.
- Reverse top-k queries. In 2010 IEEE 26th International Conference on Data Engineering (ICDE), 365–376.
- Monochromatic and Bichromatic Reverse Top-k Queries. IEEE Trans. Knowl. Data Eng., 23(8): 1215–1229.
- Branch-and-bound algorithm for reverse top-k queries. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD), 481–492.
- GAIPS: Accelerating Maximum Inner Product Search with GPU. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 1920–1924.
- Deep Matrix Factorization Models for Recommender Systems. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI), 3203–3209.
- Norm-Ranging LSH for Maximum Inner Product Search. In Advances in Neural Information Processing Systems 31 (NeurIPS), 2956–2965.
- An Index Structure for Efficient Reverse Nearest Neighbor Queries. In Proceedings of the 17th International Conference on Data Engineering (ICDE), 485–492.
- A Greedy Approach for Budgeted Maximum Inner Product Search. In Advances in Neural Information Processing Systems 30 (NIPS), 5453–5462.
- Möbius Transformation for Fast Inner Product Search on Graph. In Advances in Neural Information Processing Systems 32 (NeurIPS), 8216–8227.