The smallest grammar problem revisited (1908.06428v1)
Abstract: In a seminal paper of Charikar et al. on the smallest grammar problem, the authors derive upper and lower bounds on the approximation ratios for several grammar-based compressors, but in all cases there is a gap between the lower and upper bound. Here the gaps for $\mathsf{LZ78}$ and $\mathsf{BISECTION}$ are closed by showing that the approximation ratio of $\mathsf{LZ78}$ is $\Theta( (n/\log n){2/3})$, whereas the approximation ratio of $\mathsf{BISECTION}$ is $\Theta(\sqrt{n/\log n})$. In addition, the lower bound for $\mathsf{RePair}$ is improved from $\Omega(\sqrt{\log n})$ to $\Omega(\log n/\log\log n)$. Finally, results of Arpe and Reischuk relating grammar-based compression for arbitrary alphabets and binary alphabets are improved.