On the adversarial robustness of Locality-Sensitive Hashing in Hamming space (2402.09707v3)
Abstract: Locality-sensitive hashing~[Indyk,Motwani'98] is a classical data structure for approximate nearest neighbor search. It allows, after a close to linear time preprocessing of the input dataset, to find an approximately nearest neighbor of any fixed query in sublinear time in the dataset size. The resulting data structure is randomized and succeeds with high probability for every fixed query. In many modern applications of nearest neighbor search the queries are chosen adaptively. In this paper, we study the robustness of the locality-sensitive hashing to adaptive queries in Hamming space. We present a simple adversary that can, under mild assumptions on the initial point set, provably find a query to the approximate near neighbor search data structure that the data structure fails on. Crucially, our adaptive algorithm finds the hard query exponentially faster than random sampling.
- Approximate knn classification for biomedical data. In 2020 IEEE International Conference on Big Data (Big Data), pages 3602–3607, 2020.
- Distance-sensitive hashing. In Jan Van den Bussche and Marcelo Arenas, editors, Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, Houston, TX, USA, June 10-15, 2018, pages 89–104. ACM, 2018.
- A framework for adversarial streaming via differential privacy and difference estimators. In Yael Tauman Kalai, editor, 14th Innovations in Theoretical Computer Science Conference, ITCS 2023, January 10-13, 2023, MIT, Cambridge, Massachusetts, USA, volume 251 of LIPIcs, pages 8:1–8:19. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2023.
- Thomas Dybdahl Ahle. Optimal las vegas locality sensitive data structures. In Chris Umans, editor, 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2017, Berkeley, CA, USA, October 15-17, 2017, pages 938–949. IEEE Computer Society, 2017.
- Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2006), 21-24 October 2006, Berkeley, California, USA, Proceedings, pages 459–468. IEEE Computer Society, 2006.
- Practical and optimal LSH for angular distance. In Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett, editors, Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pages 1225–1233, 2015.
- Beyond locality-sensitive hashing. In Chandra Chekuri, editor, Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2014, Portland, Oregon, USA, January 5-7, 2014, pages 1018–1028. SIAM, 2014.
- Optimal hashing-based time-space trade-offs for approximate near neighbors. In Philip N. Klein, editor, Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2017, Barcelona, Spain, Hotel Porta Fira, January 16-19, pages 47–66. SIAM, 2017.
- Anonymous Microsoft Web Data. UCI Machine Learning Repository, 1998. DOI: https://doi.org/10.24432/C5VS3Q.
- A framework for adversarially robust streaming algorithms. J. ACM, 69(2):17:1–17:33, 2022.
- Dynamic algorithms against an adaptive adversary: generic constructions and lower bounds. In Stefano Leonardi and Anupam Gupta, editors, STOC ’22: 54th Annual ACM SIGACT Symposium on Theory of Computing, Rome, Italy, June 20 - 24, 2022, pages 1671–1684. ACM, 2022.
- Moses Charikar. Similarity estimation techniques from rounding algorithms. In John H. Reif, editor, Proceedings on 34th Annual ACM Symposium on Theory of Computing, May 19-21, 2002, Montréal, Québec, Canada, pages 380–388. ACM, 2002.
- Tobias Christiani. A framework for similarity search with space-time tradeoffs using locality-sensitive filtering. In Philip N. Klein, editor, Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2017, Barcelona, Spain, Hotel Porta Fira, January 16-19, pages 31–46. SIAM, 2017.
- On the robustness of countsketch to adaptive inputs. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu, and Sivan Sabato, editors, International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pages 4112–4140. PMLR, 2022.
- On adaptive distance estimation. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
- Tricking the hashing trick: A tight lower bound on the robustness of countsketch to adaptive inputs. In Brian Williams, Yiling Chen, and Jennifer Neville, editors, Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2023, Washington, DC, USA, February 7-14, 2023, pages 7235–7243. AAAI Press, 2023.
- Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the twentieth annual symposium on Computational geometry, pages 253–262, 2004.
- Similarity search in high dimensions via hashing. In Vldb, volume 99, pages 518–529, 1999.
- Adversarially robust streaming algorithms via differential privacy. J. ACM, 69(6):42:1–42:14, 2022.
- How robust are linear sketches to adaptive inputs? In Dan Boneh, Tim Roughgarden, and Joan Feigenbaum, editors, Symposium on Theory of Computing Conference, STOC’13, Palo Alto, CA, USA, June 1-4, 2013, pages 121–130. ACM, 2013.
- Approximate nearest neighbors: Towards removing the curse of dimensionality. In Jeffrey Scott Vitter, editor, Proceedings of the Thirtieth Annual ACM Symposium on the Theory of Computing, Dallas, Texas, USA, May 23-26, 1998, pages 604–613. ACM, 1998.
- A survey on locality sensitive hashing algorithms and their applications. arXiv preprint arXiv:2102.08942, 2021.
- Michael Kapralov. Smooth tradeoffs between insert and query complexity in nearest neighbor search. In Tova Milo and Diego Calvanese, editors, Proceedings of the 34th ACM Symposium on Principles of Database Systems, PODS 2015, Melbourne, Victoria, Australia, May 31 - June 4, 2015, pages 329–342. ACM, 2015.
- Fast agglomerative hierarchical clustering algorithm using locality-sensitive hashing. Knowledge and Information Systems, 12:25–53, 2007.
- Efficient search for approximate nearest neighbor in high dimensional spaces. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, pages 614–623, 1998.
- Yann LeCun. The mnist database of handwritten digits. http://yann. lecun. com/exdb/mnist/, 1998.
- On deterministic sketching and streaming for sparse recovery and norm estimation. In Anupam Gupta, Klaus Jansen, José D. P. Rolim, and Rocco A. Servedio, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques - 15th International Workshop, APPROX 2012, and 16th International Workshop, RANDOM 2012, Cambridge, MA, USA, August 15-17, 2012. Proceedings, volume 7408 of Lecture Notes in Computer Science, pages 627–638. Springer, 2012.
- Rasmus Pagh. Locality-sensitive hashing without false negatives. In Robert Krauthgamer, editor, Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016, Arlington, VA, USA, January 10-12, 2016, pages 1–9. SIAM, 2016.
- Rasmus Pagh. Coveringlsh: Locality-sensitive hashing without false negatives. ACM Trans. Algorithms, 14(3), jun 2018.
- Rina Panigrahy. Entropy based nearest neighbor search in high dimensions. In Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2006, Miami, Florida, USA, January 22-26, 2006, pages 1186–1195. ACM Press, 2006.
- Query by humming of midi and audio using locality sensitive hashing. In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 2249–2252, 2008.
- Mushroom. UCI Machine Learning Repository, 1987. DOI: https://doi.org/10.24432/C5959T.
- Alexander Wei. Optimal las vegas approximate near neighbors in ℓnormal-ℓ\ellroman_ℓpp{}_{\mbox{p}}start_FLOATSUBSCRIPT p end_FLOATSUBSCRIPT. ACM Trans. Algorithms, 18(1):7:1–7:27, 2022.
- A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Ashish Gupta, Oded Shmueli, and Jennifer Widom, editors, VLDB’98, Proceedings of 24rd International Conference on Very Large Data Bases, August 24-27, 1998, New York City, New York, USA, pages 194–205. Morgan Kaufmann, 1998.
- Barry L Wulff. The audubon society field guide to north american mushrooms, by gary lincoff, 1982.
- Tight bounds for adversarially robust streams and sliding windows via difference estimators. In 62nd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2021, Denver, CO, USA, February 7-10, 2022, pages 1183–1196. IEEE, 2021.
- Video anomaly detection based on locality sensitive hashing filters. Pattern Recognition, 59:302–311, 2016.