Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Partitioned Learned Bloom Filter (2006.03176v2)

Published 5 Jun 2020 in cs.DS, cs.DB, and cs.LG

Abstract: Bloom filters are space-efficient probabilistic data structures that are used to test whether an element is a member of a set, and may return false positives. Recently, variations referred to as learned Bloom filters were developed that can provide improved performance in terms of the rate of false positives, by using a learned model for the represented set. However, previous methods for learned Bloom filters do not take full advantage of the learned model. Here we show how to frame the problem of optimal model utilization as an optimization problem, and using our framework derive algorithms that can achieve near-optimal performance in many cases. Experimental results from both simulated and real-world datasets show significant performance improvements from our optimization approach over both the original learned Bloom filter constructions and previously proposed heuristic improvements.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Kapil Vaidya (8 papers)
  2. Eric Knorr (2 papers)
  3. Tim Kraska (78 papers)
  4. Michael Mitzenmacher (99 papers)
Citations (32)