Bounds for Compression in Streaming Models (0711.3338v2)
Abstract: Compression algorithms and streaming algorithms are both powerful tools for dealing with massive data sets, but many of the best compression algorithms -- e.g., those based on the Burrows-Wheeler Transform -- at first seem incompatible with streaming. In this paper we consider several popular streaming models and ask in which, if any, we can compress as well as we can with the BWT. We first prove a nearly tight tradeoff between memory and redundancy for the Standard, Multipass and W-Streams models, demonstrating a bound that is achievable with the BWT but unachievable in those models. We then show we can compute the related Schindler Transform in the StreamSort model and the BWT in the Read-Write model and, thus, achieve that bound.