From LZ77 to the Run-Length Encoded Burrows-Wheeler Transform, and Back (1702.01340v1)
Abstract: The Lempel-Ziv factorization (LZ77) and the Run-Length encoded Burrows-Wheeler Transform (RLBWT) are two important tools in text compression and indexing, being their sizes $z$ and $r$ closely related to the amount of text self-repetitiveness. In this paper we consider the problem of converting the two representations into each other within a working space proportional to the input and the output. Let $n$ be the text length. We show that $RLBWT$ can be converted to $LZ77$ in $\mathcal{O}(n\log r)$ time and $\mathcal{O}(r)$ words of working space. Conversely, we provide an algorithm to convert $LZ77$ to $RLBWT$ in $\mathcal{O}\big(n(\log r + \log z)\big)$ time and $\mathcal{O}(r+z)$ words of working space. Note that $r$ and $z$ can be \emph{constant} if the text is highly repetitive, and our algorithms can operate with (up to) \emph{exponentially} less space than naive solutions based on full decompression.