Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Data-Dependent Algorithm for Querying Earth Mover's Distance with Low Doubling Dimensions (2002.12354v2)

Published 27 Feb 2020 in cs.CG and cs.DS

Abstract: In this paper, we consider the following query problem: given two weighted point sets $A$ and $B$ in the Euclidean space $\mathbb{R}d$, we want to quickly determine that whether their earth mover's distance (EMD) is larger or smaller than a pre-specified threshold $T\geq 0$. The problem finds a number of important applications in the fields of machine learning and data mining. In particular, we assume that the dimensionality $d$ is not fixed and the sizes $|A|$ and $|B|$ are large. Therefore, most of existing EMD algorithms are not quite efficient to solve this problem due to their high complexities. Here, we consider the problem under the assumption that $A$ and $B$ have low doubling dimensions, which is common for high-dimensional data in real world. Inspired by the geometric method {\em net tree}, we propose a novel ``data-dependent'' algorithm to avoid directly computing the EMD between $A$ and $B$, so as to solve this query problem more efficiently. We also study the performance of our method on synthetic and real datasets. The experimental results suggest that our method can save a large amount of running time comparing with existing EMD algorithms.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Hu Ding (34 papers)
  2. Tan Chen (17 papers)
  3. Fan Yang (878 papers)
  4. Mingyue Wang (5 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.