2000 character limit reached
The Chonkers Algorithm: Content-Defined Chunking with Strict Guarantees on Size and Locality (2509.11121v1)
Published 14 Sep 2025 in cs.DS
Abstract: This paper presents the Chonkers algorithm, a novel content-defined chunking method providing simultaneous strict guarantees on chunk size and edit locality. Unlike existing algorithms such as Rabin fingerprinting and anchor-based methods, Chonkers achieves bounded propagation of edits and precise control over chunk sizes. I describe the algorithm's layered structure, theoretical guarantees, implementation considerations, and introduce the Yarn datatype, a deduplicated, merge-tree-based string representation benefiting from Chonkers' strict guarantees.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.