Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Back to the Future: an Even More Nearly Optimal Cardinality Estimation Algorithm (1708.06839v1)

Published 22 Aug 2017 in cs.DS

Abstract: We describe a new cardinality estimation algorithm that is extremely space-efficient. It applies one of three novel estimators to the compressed state of the Flajolet-Martin-85 coupon collection process. In an apples-to-apples empirical comparison against compressed HyperLogLog sketches, the new algorithm simultaneously wins on all three dimensions of the time/space/accuracy tradeoff. Our prototype uses the zstd compression library, and produces sketches that are smaller than the entropy of HLL, so no possible implementation of compressed HLL can match its space efficiency. The paper's technical contributions include analyses and simulations of the three new estimators, accurate values for the entropies of FM85 and HLL, and a non-trivial method for estimating a double asymptotic limit via simulation.

Citations (20)

Summary

We haven't generated a summary for this paper yet.