Time-limited Bloom Filter (2306.06742v1)
Abstract: A Bloom Filter is a probabilistic data structure designed to check, rapidly and memory-efficiently, whether an element is present in a set. It has been vastly used in various computing areas and several variants, allowing deletions, dynamic sets and working with sliding windows, have surfaced over the years. When summarizing data streams, it becomes relevant to identify the more recent elements in the stream. However, most of the sliding window schemes consider the most recent items of a data stream without considering time as a factor. While this allows, e.g., storing the most recent 10000 elements, it does not easily translate into storing elements received in the last 60 seconds, unless the insertion rate is stable and known in advance. In this paper, we present the Time-limited Bloom Filter, a new BF-based approach that can save information of a given time period and correctly identify it as present when queried, while also being able to retire data when it becomes stale. The approach supports variable insertion rates while striving to keep a target false positive rate. We also make available a reference implementation of the data structure as a Redis module.
- Pay for a sliding bloom filter and get counting, distinct elements, and entropy for free. In IEEE INFOCOM 2018 - IEEE Conference on Computer Communications, pages 2204–2212, Apr 2018.
- B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422–426, Jul 1970.
- Morton filters: Faster, space-efficient cuckoo filters via biasing, compression, and decoupled logical sparsity. Proc. VLDB Endow., 11(9):1041–1055, May 2018.
- A. Broder and M. Mitzenmacher. Survey: Network applications of bloom filters: A survey. Internet Mathematics, 1, Nov 2003.
- Efficient url caching for world wide web crawling. In Proceedings of the 12th International Conference on World Wide Web, pages 679–689, 2003.
- Approximate caches for packet classification. In IEEE INFOCOM 2004, volume 4, pages 2196–2207, Mar 2004.
- Incremental service deployment using the hop-by-hop multicast routing protocol. IEEE/ACM Transactions on Networking, 14(3):543–556, Jun 2006.
- F. Deng and D. Rafiei. Approximately detecting duplicates for streaming data using stable bloom filters. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pages 25–36, Jan 2006.
- G. Einziger and R. Friedman. Counting with tinytable: Every bit counts! IEEE Access, 7:166292–166309, 2019.
- Cuckoo filter: Practically better than bloom. In Proceedings of the 10th ACM International on Conference on Emerging Networking Experiments and Technologies, pages 75–88, 2014.
- Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM Transactions on Networking, 8(3):281–293, Jun 2000.
- A. Heydon and M. Najork. Mercator: A scalable, extensible web crawler. World Wide Web, 2(4):219–229, 1999.
- R* optimizer validation and performance evaluation for local queries. SIGMOD Rec., 15(2):84–95, Jun 1986.
- Duplicate detection in click streams. In Proceedings of the 14th International Conference on World Wide Web, pages 12–21, 2005.
- R. Pagh and F. F. Rodler. Cuckoo hashing. Journal of Algorithms, 51(2):122–144, May 2004.
- Age-partitioned bloom filters. CoRR, abs/2001.03147, Jan 2020.
- Single-packet ip traceback. IEEE/ACM Transactions on Networking, 10(6):721–734, Dec 2002.
- Detecting duplicates over sliding windows with ram-efficient detached counting bloom filter arrays. In 2011 IEEE Sixth International Conference on Networking, Architecture, and Storage, pages 382–391, Jul 2011.
- M. Yoon. Aging bloom filter with two active buffers for dynamic sets. IEEE Transactions on Knowledge and Data Engineering, 22(1):134–138, Jan 2010.
- Y. Zhu and D. Shasha. Statstream: Statistical monitoring of thousands of data streams in real time. In Proceedings of the 28th International Conference on Very Large Data Bases, pages 358–369, 2002.
- Ana Rodrigues (12 papers)
- Ariel Shtul (2 papers)
- Carlos Baquero (30 papers)
- Paulo Sérgio Almeida (16 papers)