Locally consistent decomposition of strings with applications to edit distance sketching (2302.04475v2)
Abstract: In this paper we provide a new locally consistent decomposition of strings. Each string $x$ is decomposed into blocks that can be described by grammars of size $\widetilde{O}(k)$ (using some amount of randomness). If we take two strings $x$ and $y$ of edit distance at most $k$ then their block decomposition uses the same number of grammars and the $i$-th grammar of $x$ is the same as the $i$-th grammar of $y$ except for at most $k$ indexes $i$. The edit distance of $x$ and $y$ equals to the sum of edit distances of pairs of blocks where $x$ and $y$ differ. Our decomposition can be used to design a sketch of size $\widetilde{O}(k2)$ for edit distance, and also a rolling sketch for edit distance of size $\widetilde{O}(k2)$. The rolling sketch allows to update the sketched string by appending a symbol or removing a symbol from the beginning of the string.