Improved Circular $k$-Mismatch Sketches (2006.13673v1)
Abstract: The shift distance $\mathsf{sh}(S_1,S_2)$ between two strings $S_1$ and $S_2$ of the same length is defined as the minimum Hamming distance between $S_1$ and any rotation (cyclic shift) of $S_2$. We study the problem of sketching the shift distance, which is the following communication complexity problem: Strings $S_1$ and $S_2$ of length $n$ are given to two identical players (encoders), who independently compute sketches (summaries) $\mathtt{sk}(S_1)$ and $\mathtt{sk}(S_2)$, respectively, so that upon receiving the two sketches, a third player (decoder) is able to compute (or approximate) $\mathsf{sh}(S_1,S_2)$ with high probability. This paper primarily focuses on the more general $k$-mismatch version of the problem, where the decoder is allowed to declare a failure if $\mathsf{sh}(S_1,S_2)>k$, where $k$ is a parameter known to all parties. Andoni et al. (STOC'13) introduced exact circular $k$-mismatch sketches of size $\widetilde{O}(k+D(n))$, where $D(n)$ is the number of divisors of $n$. Andoni et al. also showed that their sketch size is optimal in the class of linear homomorphic sketches. We circumvent this lower bound by designing a (non-linear) exact circular $k$-mismatch sketch of size $\widetilde{O}(k)$; this size matches communication-complexity lower bounds. We also design $(1\pm \varepsilon)$-approximate circular $k$-mismatch sketch of size $\widetilde{O}(\min(\varepsilon{-2}\sqrt{k}, \varepsilon{-1.5}\sqrt{n}))$, which improves upon an $\widetilde{O}(\varepsilon{-2}\sqrt{n})$-size sketch of Crouch and McGregor (APPROX'11).