Navigable Graphs for High-Dimensional Nearest Neighbor Search: Constructions and Limits (2405.18680v3)
Abstract: There has been significant recent interest in graph-based nearest neighbor search methods, many of which are centered on the construction of navigable graphs over high-dimensional point sets. A graph is navigable if we can successfully move from any starting node to any target node using a greedy routing strategy where we always move to the neighbor that is closest to the destination according to a given distance function. The complete graph is navigable for any point set, but the important question for applications is if sparser graphs can be constructed. While this question is fairly well understood in low-dimensions, we establish some of the first upper and lower bounds for high-dimensional point sets. First, we give a simple and efficient way to construct a navigable graph with average degree $O(\sqrt{n \log n })$ for any set of $n$ points, in any dimension, for any distance function. We compliment this result with a nearly matching lower bound: even under the Euclidean metric in $O(\log n)$ dimensions, a random point set has no navigable graph with average degree $O(n{\alpha})$ for any $\alpha < 1/2$. Our lower bound relies on sharp anti-concentration bounds for binomial random variables, which we use to show that the near-neighborhoods of a set of random points do not overlap significantly, forcing any navigable graph to have many edges.
- Thomas Ahle. Asymptotic tail bound and applications. https://thomasahle.com/papers/tails.pdf, 2017.
- Practical and optimal LSH for angular distance. In Advances in Neural Information Processing Systems 28 (NeurIPS), 2015.
- Greedy heuristics and linear relaxations for the random hitting set problem. arXiv:2305.05565, 2023.
- Approximate nearest neighbor queries in fixed dimensions. In Proceedings of the \nth4 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 1993.
- Navigability of complex networks. Nature Physics, 5(1):74–80, 2009.
- Harald Cramér. On a new limit theorem in probability theory (translation of ’sur un nouveau théorème-limite de la théorie des probabilités’). arXiv:1802.05988, 2022. Translated by Hugo Touchette.
- Fast approximate nearest neighbor search with the navigating spreading-out graph. Proceedings of the VLDB Endowment, 12(5):461–474, 2019.
- Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the \nth30 Annual ACM Symposium on Theory of Computing (STOC), 1998.
- Worst-case performance of popular approximate nearest neighbor search implementations: Guarantees and limitations. Advances in Neural Information Processing Systems 36 (NeurIPS), 2023.
- Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1):117–128, 2011.
- Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(03):535–547, 2021.
- Victor Klee. On the complexity of d-dimensional Voronoi diagrams. Archiv der Mathematik, 34(1):75–80, 1980.
- Jon Kleinberg. The small-world phenomenon: an algorithmic perspective. In Proceedings of the \nth32 Annual ACM Symposium on Theory of Computing (STOC), 2000a.
- Jon M. Kleinberg. Navigation in a small world. Nature, 406(6798):845–845, 2000b.
- Thijs Laarhoven. Graph-Based Time-Space Trade-Offs for Approximate Near Neighbors. In Proceedings of the \nth34 Annual Symposium on Computational Geometry (SoCG), volume 99, 2018.
- Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(4):824–836, 2020.
- Approximate nearest neighbor algorithm based on navigable small world graphs. Information Systems, 45:61–68, 2014.
- Stanley Milgram. The small world problem. Psychology today, 2(1):60–67, 1967.
- Graph-based nearest neighbor search: From practice to theory. In Proceedings of the \nth37 International Conference on Machine Learning (ICML), 2020.
- DiskANN: fast accurate billion-point nearest neighbor search on a single node. In Advances in Neural Information Processing Systems 32 (NeurIPS), 2019.
- Godfried T. Toussaint. The relative neighbourhood graph of a finite planar set. Pattern Recognition, 12(4):261–268, 1980.
- Carlo Vercellis. A probabilistic analysis of the set covering problem. Annals of Operations Research, 1(3):255–271, 1984.
- Collective dynamics of ‘small-world’networks. Nature, 393(6684):440–442, 1998.