Papers
Topics
Authors
Recent
2000 character limit reached

Selective Run-Length Encoding (2312.17024v1)

Published 28 Dec 2023 in cs.DS, cs.IT, eess.IV, eess.SP, and math.IT

Abstract: Run-Length Encoding (RLE) is one of the most fundamental tools in data compression. However, its compression power drops significantly if there lacks consecutive elements in the sequence. In extreme cases, the output of the encoder may require more space than the input (aka size inflation). To alleviate this issue, using combinatorics, we quantify RLE's space savings for a given input distribution. With this insight, we develop the first algorithm that automatically identifies suitable symbols, then selectively encodes these symbols with RLE while directly storing the others without RLE. Through experiments on real-world datasets of various modalities, we empirically validate that our method, which maintains RLE's efficiency advantage, can effectively mitigate the size inflation dilemma.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (15)
  1. Solomon W. Golomb, “Run-length encodings (Corresp.),” IEEE Transactions on Information Theory, vol. 12, pp. 399–401, 1966.
  2. “Practical fixed length Lempel-Ziv coding,” Discrete Applied Mathematics, vol. 163, pp. 326–333, 2014, Stringology Algorithms.
  3. “Zero and narrow-width value-aware compression for quantized convolutional neural networks,” IEEE Transactions on Computers, 2023.
  4. “Improving run length encoding by preprocessing,” in 2021 Data Compression Conference (DCC). IEEE, 2021, pp. 341–341.
  5. “Sprintz: Time series compression for the internet of things,” in ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), 2018.
  6. John Miano, Compressed image file formats - JPEG, PNG, GIF, XBM, BMP, Addison-Wesley-Longman, 1999.
  7. Khalid Sayood, Introduction to data compression, Morgan Kaufmann, 1996.
  8. “Method for run-length encoding of a bitmap data stream,” 2010, US Patent 7,657,109.
  9. “An efficient variable length coding scheme for an IID source,” in Data Compression Conference (DCC), 1995.
  10. “On asymptotically optimal stationary source codes for IID sources,” in Data Compression Conference (DCC), 2011.
  11. “Compressing multisets with large alphabets,” IEEE Journal on Selected Areas in Information Theory, 2023.
  12. Analytic Combinatorics, Cambridge University Press, 2009.
  13. “Data structures and compression algorithms for high-throughput sequencing technologies,” BMC Bioinformatics, vol. 11, pp. 1–12, 2010.
  14. “Good to the last bit: Data-driven encoding with CodecDB,” in International Conference on Management of Data (SIGMOD), 2021.
  15. Latha Pillai, “Variable length coding,” Application Note: Virtex-II Series, 2003.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.