Papers
Topics
Authors
Recent
2000 character limit reached

Decomposing Words for Enhanced Compression: Exploring the Number of Runs in the Extended Burrows-Wheeler Transform

Published 5 Jun 2025 in cs.DS, cs.DM, and cs.FL | (2506.04926v1)

Abstract: The Burrows-Wheeler Transform (BWT) is a fundamental component in many data structures for text indexing and compression, widely used in areas such as bioinformatics and information retrieval. The extended BWT (eBWT) generalizes the classical BWT to multisets of strings, providing a flexible framework that captures many BWT-like constructions. Several known variants of the BWT can be viewed as instances of the eBWT applied to specific decompositions of a word. A central property of the BWT, essential for its compressibility, is the number of maximal ranges of equal letters, named runs. In this article, we explore how different decompositions of a word impact the number of runs in the resulting eBWT. First, we show that the number of decompositions of a word is exponential, even under minimal constraints on the size of the subsets in the decomposition. Second, we present an infinite family of words for which the ratio of the number of runs between the worst and best decompositions is unbounded, under the same minimal constraints. These results illustrate the potential cost of decomposition choices in eBWT-based compression and underline the challenges in optimizing run-length encoding in generalized BWT frameworks.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.