Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Lower Bounds on Performance of Metric Tree Indexing Schemes for Exact Similarity Search in High Dimensions (0812.0146v4)

Published 30 Nov 2008 in cs.DS

Abstract: Within a mathematically rigorous model, we analyse the curse of dimensionality for deterministic exact similarity search in the context of popular indexing schemes: metric trees. The datasets $X$ are sampled randomly from a domain $\Omega$, equipped with a distance, $\rho$, and an underlying probability distribution, $\mu$. While performing an asymptotic analysis, we send the intrinsic dimension $d$ of $\Omega$ to infinity, and assume that the size of a dataset, $n$, grows superpolynomially yet subexponentially in $d$. Exact similarity search refers to finding the nearest neighbour in the dataset $X$ to a query point $\omega\in\Omega$, where the query points are subject to the same probability distribution $\mu$ as datapoints. Let $\mathscr F$ denote a class of all 1-Lipschitz functions on $\Omega$ that can be used as decision functions in constructing a hierarchical metric tree indexing scheme. Suppose the VC dimension of the class of all sets ${\omega\colon f(\omega)\geq a}$, $a\in\R$ is $o(n{1/4}/\log2n)$. (In view of a 1995 result of Goldberg and Jerrum, even a stronger complexity assumption $d{O(1)}$ is reasonable.) We deduce the $\Omega(n{1/4})$ lower bound on the expected average case performance of hierarchical metric-tree based indexing schemes for exact similarity search in $(\Omega,X)$. In paricular, this bound is superpolynomial in $d$.

Citations (15)

Summary

We haven't generated a summary for this paper yet.