Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Neural Image Compression with Text-guided Encoding for both Pixel-level and Perceptual Fidelity (2403.02944v2)

Published 5 Mar 2024 in cs.CV and cs.LG

Abstract: Recent advances in text-guided image compression have shown great potential to enhance the perceptual quality of reconstructed images. These methods, however, tend to have significantly degraded pixel-wise fidelity, limiting their practicality. To fill this gap, we develop a new text-guided image compression algorithm that achieves both high perceptual and pixel-wise fidelity. In particular, we propose a compression framework that leverages text information mainly by text-adaptive encoding and training with joint image-text loss. By doing so, we avoid decoding based on text-guided generative models -- known for high generative diversity -- and effectively utilize the semantic information of text at a global level. Experimental results on various datasets show that our method can achieve high pixel-level and perceptual quality, with either human- or machine-generated captions. In particular, our method outperforms all baselines in terms of LPIPS, with some room for even more improvements when we use more carefully generated captions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Gpt-4 technical report. arXiv preprint 2303.08774, 2023.
  2. Multi-realism image compression with a conditional generator. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
  3. Spice: Semantic propositional image caption evaluation. In European Conference on Computer Vision, 2016.
  4. End-to-end optimized image compression. In International Conference on Learning Representations, 2017.
  5. Variational image compression with a scale hyperprior. In International Conference on Learning Representations, 2018.
  6. Towards improved lossy image compression: Human image reconstruction with public-domain images. arXiv preprint 1810.11137, 2018.
  7. Demystifying MMD GANs. International Conference on Learning Representations, 2018.
  8. The perception-distortion tradeoff. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.
  9. Towards image compression with perfect realism at ultra-low bitrates. In International Conference on Learning Representations, 2024.
  10. Microsoft COCO captions: Data collection and evaluation server. In arXiv preprint 1504.00325, 2015.
  11. Vision transformer adapter for dense predictions. In International Conference on Learning Representations, 2023.
  12. Learned image compression with discretized Gaussian mixture likelihoods and attention modules. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
  13. Image quality measures and their performance. IEEE Transactions on Communications, 1995.
  14. Franzen, R. Kodak lossless true color image suite, 1999.
  15. Elic: Efficient learned image compression with unevenly grouped space-channel contextual adaptive coding. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
  16. Deep residual learning for image recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016.
  17. GANs trained by a two time-scale update rule converge to a local nash equilibrium. Advances in Neural Information Processing Systems, 2017.
  18. High-fidelity image compression with score-based generative models. arXiv preprint 2305.18231, 2023.
  19. Rethinking FID: Towards a better evaluation metric for image generation. arXiv preprint 2401.09603, 2023.
  20. Multi-modality deep network for extreme learned image compression. In Proceedings of the AAAI Conference on Artificial Intelligence, 2023.
  21. Text + Sketch: Image compression at ultra low rates. arXiv preprint 2307.01944v1, 2023.
  22. BLIP-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In Proceedings of the International Conference on Machine Learning, 2023.
  23. Microsoft COCO: Common objects in context. In European Conference on Computer Vision, 2014.
  24. Visual instruction tuning. In Advances in Neural Information Processing Systems, 2023a.
  25. Learned image compression with mixed transformer-cnn architectures. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023b.
  26. High-fidelity generative image compression. In Advances in Neural Information Processing Systems, 2020.
  27. Joint autoregressive and hierarchical priors for learned image compression. In Advances in Neural Information Processing Systems, 2018.
  28. Improving statistical fidelity for neural image compression with implicit local likelihood models. In Proceedings of the International Conference on Machine Learning, 2023.
  29. Extreme generative image compression by learning text embedding from diffusion models. arXiv preprint 2211.07793v1, 2022.
  30. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2002.
  31. PieAPP: Perceptual image-error assessment through pairwise preference. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.
  32. Perceptual image compression with cooperative cross-modal side information. arXiv preprint arXiv:2311.13847, 2023.
  33. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning, 2021.
  34. Zero-shot text-to-image generation. In Proceedings of the International Conference on Machine Learning, 2021.
  35. High-resolution image synthesis with latent diffusion models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
  36. Shannon, C. E. Prediction and entropy of printed english. Bell System Technical Journal, 1951.
  37. CLIC 2020: Challenge on learned image compression, 2020, 2020.
  38. CIDEr: Consensus-based image description evaluation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2015.
  39. OFA: Unifying architectures, tasks, and modalities through a simple sequence-to-sequence learning framework. In Proceedings of the International Conference on Machine Learning, 2022.
  40. Multiscale structural similarity for image quality assessment. In The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003. IEEE, 2003.
  41. Weissman, T. Toward textual transform coding. arXiv preprint 2305.01857v1, 2023.
  42. AttnGAN: Fine-grained text to image generation with attentional generative adversarial networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.
  43. Lossy image compression with conditional diffusion models. Advances in Neural Information Processing Systems, 2023.
  44. Adding conditional control to text-to-image diffusion models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
  45. The unreasonable effectiveness of deep features as a perceptual metric. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Hagyeong Lee (1 paper)
  2. Minkyu Kim (51 papers)
  3. Jun-Hyuk Kim (14 papers)
  4. Seungeon Kim (3 papers)
  5. Dokwan Oh (5 papers)
  6. Jaeho Lee (51 papers)
Citations (1)