Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Controlling Rate, Distortion, and Realism: Towards a Single Comprehensive Neural Image Compression Model (2405.16817v1)

Published 27 May 2024 in cs.CV and eess.IV

Abstract: In recent years, neural network-driven image compression (NIC) has gained significant attention. Some works adopt deep generative models such as GANs and diffusion models to enhance perceptual quality (realism). A critical obstacle of these generative NIC methods is that each model is optimized for a single bit rate. Consequently, multiple models are required to compress images to different bit rates, which is impractical for real-world applications. To tackle this issue, we propose a variable-rate generative NIC model. Specifically, we explore several discriminator designs tailored for the variable-rate approach and introduce a novel adversarial loss. Moreover, by incorporating the newly proposed multi-realism technique, our method allows the users to adjust the bit rate, distortion, and realism with a single model, achieving ultra-controllability. Unlike existing variable-rate generative NIC models, our method matches or surpasses the performance of state-of-the-art single-rate generative NIC models while covering a wide range of bit rates using just one model. Code will be available at https://github.com/iwa-shi/CRDR

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Kodak photodc dataset, 1991. https://r0k.us/graphics/kodak/.
  2. Multi-realism image compression with a conditional generator. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023.
  3. Variable rate allocation for vector-quantized autoencoders. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.
  4. End-to-end optimized image compression. In International Conference on Learning Representations (ICLR), 2017.
  5. Variational image compression with a scale hyperprior. In International Conference on Learning Representations (ICLR), 2018.
  6. The perception-distortion tradeoff. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  7. Rethinking lossy compression: The rate-distortion-perception tradeoff. In Proceedings of the 36th International Conference on Machine Learning (ICML), 2019.
  8. Overview of the versatile video coding (vvc) standard and its applications. IEEE Transactions on Circuits and Systems for Video Technology, 31(10):3736–3764, 2021.
  9. High-fidelity variable-rate image compression via invertible activation transformation. In Proceedings of the 30th ACM International Conference on Multimedia (ACMMM), 2022.
  10. End-to-end learnt image compression via non-local attention optimization and improved context modeling. IEEE Transactions on Image Processing, 30:3179–3191, 2021.
  11. Variable bitrate image compression with quality scaling factors. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020.
  12. Perceptual image compression using relativistic average least squares gans. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2021.
  13. Learned image compression with discretized gaussian mixture likelihoods and attention modules. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  14. Variable rate deep image compression with a conditional autoencoder. In IEEE/CVF International Conference on Computer Vision (ICCV), 2019.
  15. Asymmetric gained deep image compression with continuous rate adaptation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  16. Image compression with product quantized masked image modeling. Transactions on Machine Learning Research, 2023.
  17. Flexible neural image compression via code editing. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
  18. Perceptual learned image compression with continuous rate adaptation. In 4th Challenge on Learned Image Compression (CLIC), 2021.
  19. Fraunhofer Gesellschaft. VTM-17.1, 2022. https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/-/releases/VTM-17.1.
  20. A residual diffusion model for high perceptual quality codec augmentation. arXiv preprint arXiv:2301.05489, 2023.
  21. Generative adversarial nets. In International Conference on Neural Information Processing Systems, 2014.
  22. User-guided variable rate learned image compression. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2022.
  23. Elic: Efficient learned image compression with unevenly grouped space-channel contextual adaptive coding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  24. Checkerboard context model for efficient learned image compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  25. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems (NeurIPS), 2017.
  26. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
  27. High-fidelity image compression with score-based generative models. arXiv preprint arXiv:2305.18231, 2023.
  28. Image-to-image translation with conditional adversarial networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
  29. Alexia Jolicoeur-Martineau. The relativistic discriminator: a key element missing from standard GAN. In International Conference on Learning Representations (ICLR), 2019.
  30. Auto-encoding variational bayes. In International Conference on Learning Representations, (ICLR), 2014.
  31. Contextformer: A transformer with spatio-channel attention for context modeling in learned image compression. In European Conference on Computer Vision (ECCV), 2022.
  32. The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. International Journal of Computer Vision, 128:1956–1981, 2020.
  33. Context-adaptive entropy model for end-to-end optimized image compression. In International Conference on Learning Representations (ICLR), 2019.
  34. Learned image compression with mixed transformer-cnn architectures. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  35. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10012–10022, October 2021.
  36. Variable rate roi image compression optimized for visual quality. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2021.
  37. M2t: Masking transformers twice for faster decoding. arXiv preprint arXiv:2304.07313, 2023.
  38. High-fidelity generative image compression. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
  39. Nerf: Representing scenes as neural radiance fields for view synthesis. In European Conference on Computer Vision (ECCV), 2020.
  40. Joint autoregressive and hierarchical priors for learned image compression. In Advances in Neural Information Processing Systems (NeurIPS), 2018.
  41. Channel-wise autoregressive entropy models for learned image compression. In IEEE International Conference on Image Processing (ICIP), 2020.
  42. Entroformer: A transformer-based entropy model for learned image compression. In International Conference on Learning Representations (ICLR), 2021.
  43. Real-time adaptive image compression. In Proceedings of International Conference on Machine Learning, 2017.
  44. Variable-rate deep image compression through spatially-adaptive feature transform. In IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
  45. Interpolation variable rate image compression. In Proceedings of ACM International Conference on Multimedia (ACMMM), 2021.
  46. Lossy compression with gaussian diffusion. arXiv preprint arXiv:2206.08889, 2022.
  47. CLIC 2020: Challenge on learned image compression, 2020. https://www.tensorflow.org/datasets/catalog/clic.
  48. Variable rate deep image compression with modulated autoencoder. IEEE Signal Processing Letters, 27:331–335, 2020.
  49. Lossy image compression with conditional diffusion models. arXiv preprint arXiv:2209.06950, 2023.
  50. The unreasonable effectiveness of deep features as a perceptual metric. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  51. Transformer-based transform coding. In International Conference on Learning Representations (ICLR), 2022.
  52. The devil is in the details: Window-based attention for image compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Shoma Iwai (3 papers)
  2. Tomo Miyazaki (14 papers)
  3. Shinichiro Omachi (18 papers)
Citations (8)
X Twitter Logo Streamline Icon: https://streamlinehq.com