Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Text + Sketch: Image Compression at Ultra Low Rates (2307.01944v1)

Published 4 Jul 2023 in cs.LG, cs.CV, cs.IT, and math.IT

Abstract: Recent advances in text-to-image generative models provide the ability to generate high-quality images from short text descriptions. These foundation models, when pre-trained on billion-scale datasets, are effective for various downstream tasks with little or no further training. A natural question to ask is how such models may be adapted for image compression. We investigate several techniques in which the pre-trained models can be directly used to implement compression schemes targeting novel low rate regimes. We show how text descriptions can be used in conjunction with side information to generate high-fidelity reconstructions that preserve both semantics and spatial structure of the original. We demonstrate that at very low bit-rates, our method can significantly improve upon learned compressors in terms of perceptual and semantic fidelity, despite no end-to-end training.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Clic 2021: Challenge on learned image compression. URL https://clic.compression.cc/2021/index.html.
  2. Ntire 2017 challenge on single image super-resolution: Dataset and study. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp.  1122–1131, 2017. doi: 10.1109/CVPRW.2017.150.
  3. Soft-to-hard vector quantization for end-to-end learning compressible representations. Advances in neural information processing systems, 30, 2017.
  4. Generative adversarial networks for extreme learned image compression. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  221–231, 2019.
  5. End-to-end optimized image compression. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=rJxdQ3jeg.
  6. Nonlinear transform coding. IEEE Journal of Selected Topics in Signal Processing, 15(2):339–353, 2021. doi: 10.1109/JSTSP.2020.3034501.
  7. Towards improved lossy image compression: Human image reconstruction with public-domain images. arXiv preprint arXiv:1810.11137, 2018.
  8. Humans are still the best lossy image compressors. In 2019 Data Compression Conference (DCC), pp.  558–558, 2019. doi: 10.1109/DCC.2019.00070.
  9. Demystifying mmd gans. arXiv preprint arXiv:1801.01401, 2018.
  10. Rethinking lossy compression: The rate-distortion-perception tradeoff. In International Conference on Machine Learning, pp. 675–685. PMLR, 2019.
  11. Learned image compression with discretized gaussian mixture likelihoods and attention modules. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  7939–7948, 2020.
  12. Cogview: Mastering text-to-image generation via transformers. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, volume 34, pp.  19822–19835. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/a4d92e2cd541fca87e4620aba658316d-Paper.pdf.
  13. Franzen, R. W. Kodak lossless true color image suite. URL https://r0k.us/graphics/kodak/.
  14. Stylegan-nada: Clip-guided domain adaptation of image generators, 2021.
  15. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
  16. The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. International Journal of Computer Vision, 128(7):1956–1981, 2020.
  17. xformers: A modular and hackable transformer modelling library. https://github.com/facebookresearch/xformers, 2022.
  18. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International Conference on Machine Learning, pp. 12888–12900. PMLR, 2022.
  19. High-fidelity generative image compression. Advances in Neural Information Processing Systems, 33:11913–11924, 2020.
  20. High-fidelity performance metrics for generative models in pytorch, 2020. URL https://github.com/toshas/torch-fidelity. Version: 0.3.0, DOI: 10.5281/zenodo.4957738.
  21. Extreme generative image compression by learning text embedding from diffusion models. arXiv preprint arXiv:2211.07793, 2022.
  22. Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR, 2021.
  23. Zero-shot text-to-image generation. In International Conference on Machine Learning, pp. 8821–8831. PMLR, 2021.
  24. Hierarchical text-conditional image generation with clip latents, 2022.
  25. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  10684–10695, 2022.
  26. From show to tell: A survey on deep learning-based image captioning. IEEE transactions on pattern analysis and machine intelligence, 45(1):539–559, 2022.
  27. Lossy image compression with compressive autoencoders. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=rJiNwv9gg.
  28. Lossy compression with gaussian diffusion. arXiv preprint arXiv:2206.08889, 2022.
  29. Variable rate image compression with recurrent neural networks. In Bengio, Y. and LeCun, Y. (eds.), 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, 2016. URL http://arxiv.org/abs/1511.06085.
  30. Diffusers: State-of-the-art diffusion models. https://github.com/huggingface/diffusers, 2022.
  31. Multiscale structural similarity for image quality assessment. In The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, volume 2, pp.  1398–1402 Vol.2, 2003. doi: 10.1109/ACSSC.2003.1292216.
  32. Weissman, T. Toward textual transform coding. arXiv preprint arXiv:2305.01857, 2023.
  33. Hard prompts made easy: Gradient-based discrete optimization for prompt tuning and discovery. arXiv preprint arXiv:2302.03668, 2023.
  34. Holistically-nested edge detection. In Proceedings of the IEEE international conference on computer vision, pp.  1395–1403, 2015.
  35. Video enhancement with task-oriented flow. International Journal of Computer Vision (IJCV), 127(8):1106–1125, 2019.
  36. Lossy image compression with conditional diffusion models. arXiv preprint arXiv:2209.06950, 2022.
  37. Adding conditional control to text-to-image diffusion models. arXiv preprint arXiv:2302.05543, 2023.
  38. The unreasonable effectiveness of deep features as a perceptual metric. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  586–595, Los Alamitos, CA, USA, jun 2018. IEEE Computer Society. doi: 10.1109/CVPR.2018.00068. URL https://doi.ieeecomputersociety.org/10.1109/CVPR.2018.00068.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Eric Lei (14 papers)
  2. Yiğit Berkay Uslu (3 papers)
  3. Hamed Hassani (120 papers)
  4. Shirin Saeedi Bidokhti (31 papers)
Citations (21)