Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fast Distance Oracles for Any Symmetric Norm (2205.14816v1)

Published 30 May 2022 in cs.DS

Abstract: In the Distance Oracle problem, the goal is to preprocess $n$ vectors $x_1, x_2, \cdots, x_n$ in a $d$-dimensional metric space $(\mathbb{X}d, | \cdot |l)$ into a cheap data structure, so that given a query vector $q \in \mathbb{X}d$ and a subset $S\subseteq [n]$ of the input data points, all distances $| q - x_i |_l$ for $x_i\in S$ can be quickly approximated (faster than the trivial $\sim d|S|$ query time). This primitive is a basic subroutine in machine learning, data mining and similarity search applications. In the case of $\ell_p$ norms, the problem is well understood, and optimal data structures are known for most values of $p$. Our main contribution is a fast $(1+\varepsilon)$ distance oracle for any symmetric norm $|\cdot|_l$. This class includes $\ell_p$ norms and Orlicz norms as special cases, as well as other norms used in practice, e.g. top-$k$ norms, max-mixture and sum-mixture of $\ell_p$ norms, small-support norms and the box-norm. We propose a novel data structure with $\tilde{O}(n (d + \mathrm{mmc}(l)2 ) )$ preprocessing time and space, and $t_q = \tilde{O}(d + |S| \cdot \mathrm{mmc}(l)2)$ query time, for computing distances to a subset $S$ of data points, where $\mathrm{mmc}(l)$ is a complexity-measure (concentration modulus) of the symmetric norm. When $l = \ell{p}$ , this runtime matches the aforementioned state-of-art oracles.

Citations (7)

Summary

We haven't generated a summary for this paper yet.