Re-Pair In Small Space
Abstract: Re-Pair is a grammar compression scheme with favorably good compression rates. The computation of Re-Pair comes with the cost of maintaining large frequency tables, which makes it hard to compute Re-Pair on large scale data sets. As a solution for this problem we present, given a text of length $n$ whose characters are drawn from an integer alphabet, an $O(n2) \cap O(n2 \lg \log_\tau n \lg \lg \lg n / \log_\tau n)$ time algorithm computing Re-Pair in $n \lg \max(n,\tau)$ bits of space including the text space, where $\tau$ is the number of terminals and non-terminals. The algorithm works in the restore model, supporting the recovery of the original input in the time for the Re-Pair computation with $O(\lg n)$ additional bits of working space. We give variants of our solution working in parallel or in the external memory model.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.