Selective Run-Length Encoding (2312.17024v1)
Abstract: Run-Length Encoding (RLE) is one of the most fundamental tools in data compression. However, its compression power drops significantly if there lacks consecutive elements in the sequence. In extreme cases, the output of the encoder may require more space than the input (aka size inflation). To alleviate this issue, using combinatorics, we quantify RLE's space savings for a given input distribution. With this insight, we develop the first algorithm that automatically identifies suitable symbols, then selectively encodes these symbols with RLE while directly storing the others without RLE. Through experiments on real-world datasets of various modalities, we empirically validate that our method, which maintains RLE's efficiency advantage, can effectively mitigate the size inflation dilemma.
- Solomon W. Golomb, “Run-length encodings (Corresp.),” IEEE Transactions on Information Theory, vol. 12, pp. 399–401, 1966.
- “Practical fixed length Lempel-Ziv coding,” Discrete Applied Mathematics, vol. 163, pp. 326–333, 2014, Stringology Algorithms.
- “Zero and narrow-width value-aware compression for quantized convolutional neural networks,” IEEE Transactions on Computers, 2023.
- “Improving run length encoding by preprocessing,” in 2021 Data Compression Conference (DCC). IEEE, 2021, pp. 341–341.
- “Sprintz: Time series compression for the internet of things,” in ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), 2018.
- John Miano, Compressed image file formats - JPEG, PNG, GIF, XBM, BMP, Addison-Wesley-Longman, 1999.
- Khalid Sayood, Introduction to data compression, Morgan Kaufmann, 1996.
- “Method for run-length encoding of a bitmap data stream,” 2010, US Patent 7,657,109.
- “An efficient variable length coding scheme for an IID source,” in Data Compression Conference (DCC), 1995.
- “On asymptotically optimal stationary source codes for IID sources,” in Data Compression Conference (DCC), 2011.
- “Compressing multisets with large alphabets,” IEEE Journal on Selected Areas in Information Theory, 2023.
- Analytic Combinatorics, Cambridge University Press, 2009.
- “Data structures and compression algorithms for high-throughput sequencing technologies,” BMC Bioinformatics, vol. 11, pp. 1–12, 2010.
- “Good to the last bit: Data-driven encoding with CodecDB,” in International Conference on Management of Data (SIGMOD), 2021.
- Latha Pillai, “Variable length coding,” Application Note: Virtex-II Series, 2003.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.