Controlling Rate, Distortion, and Realism: Towards a Single Comprehensive Neural Image Compression Model (2405.16817v1)
Abstract: In recent years, neural network-driven image compression (NIC) has gained significant attention. Some works adopt deep generative models such as GANs and diffusion models to enhance perceptual quality (realism). A critical obstacle of these generative NIC methods is that each model is optimized for a single bit rate. Consequently, multiple models are required to compress images to different bit rates, which is impractical for real-world applications. To tackle this issue, we propose a variable-rate generative NIC model. Specifically, we explore several discriminator designs tailored for the variable-rate approach and introduce a novel adversarial loss. Moreover, by incorporating the newly proposed multi-realism technique, our method allows the users to adjust the bit rate, distortion, and realism with a single model, achieving ultra-controllability. Unlike existing variable-rate generative NIC models, our method matches or surpasses the performance of state-of-the-art single-rate generative NIC models while covering a wide range of bit rates using just one model. Code will be available at https://github.com/iwa-shi/CRDR
- Kodak photodc dataset, 1991. https://r0k.us/graphics/kodak/.
- Multi-realism image compression with a conditional generator. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023.
- Variable rate allocation for vector-quantized autoencoders. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.
- End-to-end optimized image compression. In International Conference on Learning Representations (ICLR), 2017.
- Variational image compression with a scale hyperprior. In International Conference on Learning Representations (ICLR), 2018.
- The perception-distortion tradeoff. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- Rethinking lossy compression: The rate-distortion-perception tradeoff. In Proceedings of the 36th International Conference on Machine Learning (ICML), 2019.
- Overview of the versatile video coding (vvc) standard and its applications. IEEE Transactions on Circuits and Systems for Video Technology, 31(10):3736–3764, 2021.
- High-fidelity variable-rate image compression via invertible activation transformation. In Proceedings of the 30th ACM International Conference on Multimedia (ACMMM), 2022.
- End-to-end learnt image compression via non-local attention optimization and improved context modeling. IEEE Transactions on Image Processing, 30:3179–3191, 2021.
- Variable bitrate image compression with quality scaling factors. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020.
- Perceptual image compression using relativistic average least squares gans. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2021.
- Learned image compression with discretized gaussian mixture likelihoods and attention modules. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- Variable rate deep image compression with a conditional autoencoder. In IEEE/CVF International Conference on Computer Vision (ICCV), 2019.
- Asymmetric gained deep image compression with continuous rate adaptation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- Image compression with product quantized masked image modeling. Transactions on Machine Learning Research, 2023.
- Flexible neural image compression via code editing. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
- Perceptual learned image compression with continuous rate adaptation. In 4th Challenge on Learned Image Compression (CLIC), 2021.
- Fraunhofer Gesellschaft. VTM-17.1, 2022. https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/-/releases/VTM-17.1.
- A residual diffusion model for high perceptual quality codec augmentation. arXiv preprint arXiv:2301.05489, 2023.
- Generative adversarial nets. In International Conference on Neural Information Processing Systems, 2014.
- User-guided variable rate learned image compression. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2022.
- Elic: Efficient learned image compression with unevenly grouped space-channel contextual adaptive coding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Checkerboard context model for efficient learned image compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems (NeurIPS), 2017.
- Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
- High-fidelity image compression with score-based generative models. arXiv preprint arXiv:2305.18231, 2023.
- Image-to-image translation with conditional adversarial networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- Alexia Jolicoeur-Martineau. The relativistic discriminator: a key element missing from standard GAN. In International Conference on Learning Representations (ICLR), 2019.
- Auto-encoding variational bayes. In International Conference on Learning Representations, (ICLR), 2014.
- Contextformer: A transformer with spatio-channel attention for context modeling in learned image compression. In European Conference on Computer Vision (ECCV), 2022.
- The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. International Journal of Computer Vision, 128:1956–1981, 2020.
- Context-adaptive entropy model for end-to-end optimized image compression. In International Conference on Learning Representations (ICLR), 2019.
- Learned image compression with mixed transformer-cnn architectures. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10012–10022, October 2021.
- Variable rate roi image compression optimized for visual quality. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2021.
- M2t: Masking transformers twice for faster decoding. arXiv preprint arXiv:2304.07313, 2023.
- High-fidelity generative image compression. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
- Nerf: Representing scenes as neural radiance fields for view synthesis. In European Conference on Computer Vision (ECCV), 2020.
- Joint autoregressive and hierarchical priors for learned image compression. In Advances in Neural Information Processing Systems (NeurIPS), 2018.
- Channel-wise autoregressive entropy models for learned image compression. In IEEE International Conference on Image Processing (ICIP), 2020.
- Entroformer: A transformer-based entropy model for learned image compression. In International Conference on Learning Representations (ICLR), 2021.
- Real-time adaptive image compression. In Proceedings of International Conference on Machine Learning, 2017.
- Variable-rate deep image compression through spatially-adaptive feature transform. In IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
- Interpolation variable rate image compression. In Proceedings of ACM International Conference on Multimedia (ACMMM), 2021.
- Lossy compression with gaussian diffusion. arXiv preprint arXiv:2206.08889, 2022.
- CLIC 2020: Challenge on learned image compression, 2020. https://www.tensorflow.org/datasets/catalog/clic.
- Variable rate deep image compression with modulated autoencoder. IEEE Signal Processing Letters, 27:331–335, 2020.
- Lossy image compression with conditional diffusion models. arXiv preprint arXiv:2209.06950, 2023.
- The unreasonable effectiveness of deep features as a perceptual metric. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- Transformer-based transform coding. In International Conference on Learning Representations (ICLR), 2022.
- The devil is in the details: Window-based attention for image compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Shoma Iwai (3 papers)
- Tomo Miyazaki (14 papers)
- Shinichiro Omachi (18 papers)