Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Approximately Minwise Independence with Twisted Tabulation (1404.6724v2)

Published 27 Apr 2014 in cs.DS

Abstract: A random hash function $h$ is $\varepsilon$-minwise if for any set $S$, $|S|=n$, and element $x\in S$, $\Pr[h(x)=\min h(S)]=(1\pm\varepsilon)/n$. Minwise hash functions with low bias $\varepsilon$ have widespread applications within similarity estimation. Hashing from a universe $[u]$, the twisted tabulation hashing of P\v{a}tra\c{s}cu and Thorup [SODA'13] makes $c=O(1)$ lookups in tables of size $u{1/c}$. Twisted tabulation was invented to get good concentration for hashing based sampling. Here we show that twisted tabulation yields $\tilde O(1/u{1/c})$-minwise hashing. In the classic independence paradigm of Wegman and Carter [FOCS'79] $\tilde O(1/u{1/c})$-minwise hashing requires $\Omega(\log u)$-independence [Indyk SODA'99]. P\v{a}tra\c{s}cu and Thorup [STOC'11] had shown that simple tabulation, using same space and lookups yields $\tilde O(1/n{1/c})$-minwise independence, which is good for large sets, but useless for small sets. Our analysis uses some of the same methods, but is much cleaner bypassing a complicated induction argument.

Citations (15)

Summary

We haven't generated a summary for this paper yet.