2000 character limit reached
AlphaZip: Neural Network-Enhanced Lossless Text Compression (2409.15046v1)
Published 23 Sep 2024 in cs.IT, cs.AI, cs.LG, and math.IT
Abstract: Data compression continues to evolve, with traditional information theory methods being widely used for compressing text, images, and videos. Recently, there has been growing interest in leveraging Generative AI for predictive compression techniques. This paper introduces a lossless text compression approach using a LLM. The method involves two key steps: first, prediction using a dense neural network architecture, such as a transformer block; second, compressing the predicted ranks with standard compression algorithms like Adaptive Huffman, LZ77, or Gzip. Extensive analysis and benchmarking against conventional information-theoretic baselines demonstrate that neural compression offers improved performance.