Approximate Hamming distance in a stream (1602.07241v1)
Abstract: We consider the problem of computing a $(1+\epsilon)$-approximation of the Hamming distance between a pattern of length $n$ and successive substrings of a stream. We first look at the one-way randomised communication complexity of this problem, giving Alice the first half of the stream and Bob the second half. We show the following: (1) If Alice and Bob both share the pattern then there is an $O(\epsilon{-4} \log2 n)$ bit randomised one-way communication protocol. (2) If only Alice has the pattern then there is an $O(\epsilon{-2}\sqrt{n}\log n)$ bit randomised one-way communication protocol. We then go on to develop small space streaming algorithms for $(1+\epsilon)$-approximate Hamming distance which give worst case running time guarantees per arriving symbol. (1) For binary input alphabets there is an $O(\epsilon{-3} \sqrt{n} \log{2} n)$ space and $O(\epsilon{-2} \log{n})$ time streaming $(1+\epsilon)$-approximate Hamming distance algorithm. (2) For general input alphabets there is an $O(\epsilon{-5} \sqrt{n} \log{4} n)$ space and $O(\epsilon{-4} \log3 {n})$ time streaming $(1+\epsilon)$-approximate Hamming distance algorithm.