Compressed Data Structures for Dynamic Sequences (1507.06866v1)
Abstract: We consider the problem of storing a dynamic string $S$ over an alphabet $\Sigma={\,1,\ldots,\sigma\,}$ in compressed form. Our representation supports insertions and deletions of symbols and answers three fundamental queries: $\mathrm{access}(i,S)$ returns the $i$-th symbol in $S$, $\mathrm{rank}a(i,S)$ counts how many times a symbol $a$ occurs among the first $i$ positions in $S$, and $\mathrm{select}_a(i,S)$ finds the position where a symbol $a$ occurs for the $i$-th time. We present the first fully-dynamic data structure for arbitrarily large alphabets that achieves optimal query times for all three operations and supports updates with worst-case time guarantees. Ours is also the first fully-dynamic data structure that needs only $nH_k+o(n\log\sigma)$ bits, where $H_k$ is the $k$-th order entropy and $n$ is the string length. Moreover our representation supports extraction of a substring $S[i..i+\ell]$ in optimal $O(\log n/\log\log n + \ell/\log{\sigma}n)$ time.