Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Local Decodability of the Burrows-Wheeler Transform (1808.03978v2)

Published 12 Aug 2018 in cs.DS, cs.CC, and cs.IR

Abstract: The Burrows-Wheeler Transform (BWT) is among the most influential discoveries in text compression and DNA storage. It is a reversible preprocessing step that rearranges an $n$-letter string into runs of identical characters (by exploiting context regularities), resulting in highly compressible strings, and is the basis of the \texttt{bzip} compression program. Alas, the decoding process of BWT is inherently sequential and requires $\Omega(n)$ time even to retrieve a \emph{single} character. We study the succinct data structure problem of locally decoding short substrings of a given text under its \emph{compressed} BWT, i.e., with small additive redundancy $r$ over the \emph{Move-To-Front} (\texttt{bzip}) compression. The celebrated BWT-based FM-index (FOCS '00), as well as other related literature, yield a trade-off of $r=\tilde{O}(n/\sqrt{t})$ bits, when a single character is to be decoded in $O(t)$ time. We give a near-quadratic improvement $r=\tilde{O}(n\lg(t)/t)$. As a by-product, we obtain an \emph{exponential} (in $t$) improvement on the redundancy of the FM-index for counting pattern-matches on compressed text. In the interesting regime where the text compresses to $n{1-o(1)}$ bits, these results provide an $\exp(t)$ \emph{overall} space reduction. For the local decoding problem of BWT, we also prove an $\Omega(n/t2)$ cell-probe lower bound for "symmetric" data structures. We achieve our main result by designing a compressed partial-sums (Rank) data structure over BWT. The key component is a \emph{locally-decodable} Move-to-Front (MTF) code: with only $O(1)$ extra bits per block of length $n{\Omega(1)}$, the decoding time of a single character can be decreased from $\Omega(n)$ to $O(\lg n)$. This result is of independent interest in algorithmic information theory.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Sandip Sinha (8 papers)
  2. Omri Weinstein (38 papers)
Citations (2)