Rigorous log-linear time bound for NNDescent

Establish that the NNDescent algorithm of Dong et al. (2011) for approximate k-nearest neighbor graph construction on N elements with an arbitrary symmetric distance function (not necessarily a metric) converges in at most 2⌈log_{2k} N⌉ iterations, thereby implying an overall time complexity of O(k^2 N log N), without relying on range-query implementations or homogeneous Poisson-process assumptions.

Background

The paper’s greedy coordinate descent (GCD) reconstruction algorithm depends critically on repeatedly solving a k-nearest neighbor (KNN) problem via NNDescent to identify the best O(N) edge-update candidates in subquadratic time. The claimed subquadratic overall complexity is predicated on NNDescent having a log-linear iteration bound, which leads to O(k2 N log N) time for the KNN step.

The authors note that, while NNDescent performs well empirically and is widely used (e.g., in UMAP), a detailed theoretical analysis of its complexity is lacking. Existing results rigorously prove the conjectured bound only for a modified setting (range queries) under homogeneous Poisson-process data, conditions that do not apply to their general reconstruction setting. A general proof would solidify the theoretical foundations of the paper’s subquadratic complexity claims.

References

This conjectured bound on the convergence results in an overall O(k{2}N\log N) complexity. However, this conjecture can only be rigorously proven on a version of the algorithm where the second neighbor search is replaced by a range query, and the data is generated by a homogeneous Poisson process.

Scalable network reconstruction in subquadratic time (2401.01404 - Peixoto, 2 Jan 2024) in Section 3 (Subquadratic network reconstruction), paragraphs discussing NNDescent complexity near Algorithm 4