Unifying Generation and Compression: Ultra-low bitrate Image Coding Via Multi-stage Transformer (2403.03736v1)
Abstract: Recent progress in generative compression technology has significantly improved the perceptual quality of compressed data. However, these advancements primarily focus on producing high-frequency details, often overlooking the ability of generative models to capture the prior distribution of image content, thus impeding further bitrate reduction in extreme compression scenarios (<0.05 bpp). Motivated by the capabilities of predictive LLMs for lossless compression, this paper introduces a novel Unified Image Generation-Compression (UIGC) paradigm, merging the processes of generation and compression. A key feature of the UIGC framework is the adoption of vector-quantized (VQ) image models for tokenization, alongside a multi-stage transformer designed to exploit spatial contextual information for modeling the prior distribution. As such, the dual-purpose framework effectively utilizes the learned prior for entropy estimation and assists in the regeneration of lost tokens. Extensive experiments demonstrate the superiority of the proposed UIGC framework over existing codecs in perceptual quality and human perception, particularly in ultra-low bitrate scenarios (<=0.03 bpp), pioneering a new direction in generative compression.
- “Overview of the versatile video coding (vvc) standard and its applications,” TCSVT, 2021.
- “Joint autoregressive and hierarchical priors for learned image compression,” in NIPS, 2018.
- “Learned image compression with discretized gaussian mixture likelihoods and attention modules,” in CVPR, June 2020.
- “Checkerboard context model for efficient learned image compression,” in CVPR, 2021.
- “High-efficiency lossy image coding through adaptive neighborhood information aggregation,” arXiv preprint, 2022.
- “Generative adversarial networks for extreme learned image compression,” in ICCV, 2019.
- “High-fidelity generative image compression,” in NIPS, 2020.
- “Fidelity-controllable extreme image compression with generative adversarial networks,” in ICPR, 2021.
- “Multi-realism image compression with a conditional generator,” in CVPR, 2023.
- “Layered conceptual image compression via deep semantic synthesis,” in ICIP, 2019.
- “Conceptual compression via deep structure and texture synthesis,” TIP, 2022.
- “Thousand to one: Semantic prior modeling for conceptual coding,” in ICME, 2021.
- “Semantic-aware visual decomposition for image coding,” IJCV, 2023.
- “Extreme image compression using fine-tuned vqgans,” arXiv preprint, 2023.
- “Auto-encoding variational bayes,” arXiv preprint, 2013.
- “Generative adversarial nets,” NIPS, vol. 27, 2014.
- “Denoising diffusion probabilistic models,” NIPS, 2020.
- “Language modeling is compression,” arXiv preprint, 2023.
- “Taming transformers for high-resolution image synthesis,” in CVPR, 2021.
- “Maskgit: Masked generative image transformer,” in CVPR, 2022.
- Eastman Kodak, “Kodak photocd dataset,” 2013.
- “Workshop and challenge on learned image compression (clic2020),” 2020.
- “Holistically-nested edge detection,” in ICCV, 2015.
- “Imagenet: A large-scale hierarchical image database,” in CVPR, 2009.
- Gisle Bjontegaard, “Calculation of average psnr differences between rd-curves,” ITU SG16 Doc. VCEG-M33, 2001.
- “Zlib compression library,” 2004.
- “Compressai: a pytorch library and evaluation platform for end-to-end compression research,” arXiv preprint arXiv:2011.03029, 2020.
- Naifu Xue (3 papers)
- Qi Mao (22 papers)
- Zijian Wang (99 papers)
- Yuan Zhang (331 papers)
- Siwei Ma (84 papers)