Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Chromatic $k$-Nearest Neighbor Queries (2205.00277v1)

Published 30 Apr 2022 in cs.CG

Abstract: Let $P$ be a set of $n$ colored points. We develop efficient data structures that store $P$ and can answer chromatic $k$-nearest neighbor ($k$-NN) queries. Such a query consists of a query point $q$ and a number $k$, and asks for the color that appears most frequently among the $k$ points in $P$ closest to $q$. Answering such queries efficiently is the key to obtain fast $k$-NN classifiers. Our main aim is to obtain query times that are independent of $k$ while using near-linear space. We show that this is possible using a combination of two data structures. The first data structure allow us to compute a region containing exactly the $k$-nearest neighbors of a query point $q$, and the second data structure can then report the most frequent color in such a region. This leads to linear space data structures with query times of $O(n{1 / 2} \log n)$ for points in $\mathbb{R}1$, and with query times varying between $O(n{2/3}\log{2/3} n)$ and $O(n{5/6} {\rm polylog} n)$, depending on the distance measure used, for points in $\mathbb{R}2$. Since these query times are still fairly large we also consider approximations. If we are allowed to report a color that appears at least $(1-\varepsilon)f*$ times, where $f*$ is the frequency of the most frequent color, we obtain a query time of $O(\log n + \log\log_{\frac{1}{1-\varepsilon}} n)$ in $\mathbb{R}1$ and expected query times ranging between $\tilde{O}(n{1/2}\varepsilon{-3/2})$ and $\tilde{O}(n{1/2}\varepsilon{-5/2})$ in $\mathbb{R}2$ using near-linear space (ignoring polylogarithmic factors).

Citations (1)

Summary

We haven't generated a summary for this paper yet.