Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CMC-Bench: Towards a New Paradigm of Visual Signal Compression (2406.09356v1)

Published 13 Jun 2024 in cs.CV and eess.IV

Abstract: Ultra-low bitrate image compression is a challenging and demanding topic. With the development of Large Multimodal Models (LMMs), a Cross Modality Compression (CMC) paradigm of Image-Text-Image has emerged. Compared with traditional codecs, this semantic-level compression can reduce image data size to 0.1\% or even lower, which has strong potential applications. However, CMC has certain defects in consistency with the original image and perceptual quality. To address this problem, we introduce CMC-Bench, a benchmark of the cooperative performance of Image-to-Text (I2T) and Text-to-Image (T2I) models for image compression. This benchmark covers 18,000 and 40,000 images respectively to verify 6 mainstream I2T and 12 T2I models, including 160,000 subjective preference scores annotated by human experts. At ultra-low bitrates, this paper proves that the combination of some I2T and T2I models has surpassed the most advanced visual signal codecs; meanwhile, it highlights where LMMs can be further optimized toward the compression task. We encourage LMM developers to participate in this test to promote the evolution of visual signal codec protocols.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Chunyi Li (66 papers)
  2. Xiele Wu (4 papers)
  3. Haoning Wu (68 papers)
  4. Donghui Feng (6 papers)
  5. Zicheng Zhang (124 papers)
  6. Guo Lu (39 papers)
  7. Xiongkuo Min (138 papers)
  8. Xiaohong Liu (117 papers)
  9. Guangtao Zhai (230 papers)
  10. Weisi Lin (118 papers)
Citations (4)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets