Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Approximating Large Frequency Moments with Pick-and-Drop Sampling (1212.0202v2)

Published 2 Dec 2012 in cs.DS

Abstract: Given data stream $D = {p_1,p_2,...,p_m}$ of size $m$ of numbers from ${1,..., n}$, the frequency of $i$ is defined as $f_i = |{j: p_j = i}|$. The $k$-th \emph{frequency moment} of $D$ is defined as $F_k = \sum_{i=1}n f_ik$. We consider the problem of approximating frequency moments in insertion-only streams for $k\ge 3$. For any constant $c$ we show an $O(n{1-2/k}\log(n)\log{(c)}(n))$ upper bound on the space complexity of the problem. Here $\log{(c)}(n)$ is the iterative $\log$ function. To simplify the presentation, we make the following assumptions: $n$ and $m$ are polynomially far; approximation error $\epsilon$ and parameter $k$ are constants. We observe a natural bijection between streams and special matrices. Our main technical contribution is a non-uniform sampling method on matrices. We call our method a \emph{pick-and-drop sampling}; it samples a heavy element (i.e., element $i$ with frequency $\Omega(F_k)$) with probability $\Omega(1/n{1-2/k})$ and gives approximation $\tilde{f_i} \ge (1-\epsilon)f_i$. In addition, the estimations never exceed the real values, that is $ \tilde{f_j} \le f_j$ for all $j$. As a result, we reduce the space complexity of finding a heavy element to $O(n{1-2/k}\log(n))$ bits. We apply our method of recursive sketches and resolve the problem with $O(n{1-2/k}\log(n)\log{(c)}(n))$ bits.

Citations (23)

Summary

We haven't generated a summary for this paper yet.