Papers
Topics
Authors
Recent
Search
2000 character limit reached

Efficient LZ78 factorization of grammar compressed text

Published 19 Jul 2012 in cs.DS | (1207.4607v1)

Abstract: We present an efficient algorithm for computing the LZ78 factorization of a text, where the text is represented as a straight line program (SLP), which is a context free grammar in the Chomsky normal form that generates a single string. Given an SLP of size $n$ representing a text $S$ of length $N$, our algorithm computes the LZ78 factorization of $T$ in $O(n\sqrt{N}+m\log N)$ time and $O(n\sqrt{N}+m)$ space, where $m$ is the number of resulting LZ78 factors. We also show how to improve the algorithm so that the $n\sqrt{N}$ term in the time and space complexities becomes either $nL$, where $L$ is the length of the longest LZ78 factor, or $(N - \alpha)$ where $\alpha \geq 0$ is a quantity which depends on the amount of redundancy that the SLP captures with respect to substrings of $S$ of a certain length. Since $m = O(N/\log_\sigma N)$ where $\sigma$ is the alphabet size, the latter is asymptotically at least as fast as a linear time algorithm which runs on the uncompressed string when $\sigma$ is constant, and can be more efficient when the text is compressible, i.e. when $m$ and $n$ are small.

Citations (19)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.