Spectral Approaches to Nearest Neighbor Search (1408.0751v1)
Abstract: We study spectral algorithms for the high-dimensional Nearest Neighbor Search problem (NNS). In particular, we consider a semi-random setting where a dataset $P$ in $\mathbb{R}d$ is chosen arbitrarily from an unknown subspace of low dimension $k\ll d$, and then perturbed by fully $d$-dimensional Gaussian noise. We design spectral NNS algorithms whose query time depends polynomially on $d$ and $\log n$ (where $n=|P|$) for large ranges of $k$, $d$ and $n$. Our algorithms use a repeated computation of the top PCA vector/subspace, and are effective even when the random-noise magnitude is {\em much larger} than the interpoint distances in $P$. Our motivation is that in practice, a number of spectral NNS algorithms outperform the random-projection methods that seem otherwise theoretically optimal on worst case datasets. In this paper we aim to provide theoretical justification for this disparity.