Lossless data compression by large models (2407.07723v3)

Published 24 Jun 2024 in cs.IT, cs.AI, and math.IT

Abstract: Modern data compression methods are slowly reaching their limits after 80 years of research, millions of papers, and wide range of applications. Yet, the extravagant 6G communication speed requirement raises a major open question for revolutionary new ideas of data compression. We have previously shown all understanding or learning are compression, under reasonable assumptions. LLMs understand data better than ever before. Can they help us to compress data? The LLMs may be seen to approximate the uncomputable Solomonoff induction. Therefore, under this new uncomputable paradigm, we present LMCompress. LMCompress shatters all previous lossless compression algorithms, doubling the lossless compression ratios of JPEG-XL for images, FLAC for audios, and H.264 for videos, and quadrupling the compression ratio of bz2 for texts. The better a large model understands the data, the better LMCompress compresses.

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/gm8xx8/status/1811225708712370512

YouTube

Show All Videos

HackerNews

Understanding Is Compression (3 points, 1 comment)
Lossless data compression by large models (2 points, 0 comments)

Lossless data compression by large models (2407.07723v3)

Summary

Related Papers

Tweets

YouTube

HackerNews