Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Laplacian-guided Entropy Model in Neural Codec with Blur-dissipated Synthesis (2403.16258v1)

Published 24 Mar 2024 in eess.IV, cs.CV, cs.IT, cs.LG, and math.IT

Abstract: While replacing Gaussian decoders with a conditional diffusion model enhances the perceptual quality of reconstructions in neural image compression, their lack of inductive bias for image data restricts their ability to achieve state-of-the-art perceptual levels. To address this limitation, we adopt a non-isotropic diffusion model at the decoder side. This model imposes an inductive bias aimed at distinguishing between frequency contents, thereby facilitating the generation of high-quality images. Moreover, our framework is equipped with a novel entropy model that accurately models the probability distribution of latent representation by exploiting spatio-channel correlations in latent space, while accelerating the entropy decoding step. This channel-wise entropy model leverages both local and global spatial contexts within each channel chunk. The global spatial context is built upon the Transformer, which is specifically designed for image compression tasks. The designed Transformer employs a Laplacian-shaped positional encoding, the learnable parameters of which are adaptively adjusted for each channel cluster. Our experiments demonstrate that our proposed framework yields better perceptual quality compared to cutting-edge generative-based codecs, and the proposed entropy model contributes to notable bitrate savings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. CLIC · challenge on learned image compression,. Available at http://compression.cc/tasks/, 2022.
  2. Kodak image dataset,. Available at https://r0k.us/graphics/kodak/, 2022.
  3. Versatile Video Coding Reference Software. Available at https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM, 2022.
  4. Multi-realism image compression with a conditional generator. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22324–22333, 2023.
  5. Variational image compression with a scale hyperprior. In International Conference on Learning Representations, 2018.
  6. Gisle Bjontegaard. Calculation of average psnr differences between rd-curves. ITU SG16 Doc. VCEG-M33, 2001.
  7. Rethinking lossy compression: The rate-distortion-perception tradeoff. In International Conference on Machine Learning, pages 675–685. PMLR, 2019.
  8. Free-form video inpainting with 3d gated convolution and temporal patchgan. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9066–9075, 2019.
  9. Learned image compression with discretized gaussian mixture likelihoods and attention modules. In CVPR, pages 7939–7948, 2020.
  10. Image style transfer using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2414–2423, 2016.
  11. A residual diffusion model for high perceptual quality codec augmentation. arXiv preprint arXiv:2301.05489, 2023.
  12. Vivek K Goyal. Theoretical foundations of transform coding. IEEE Signal Processing Magazine, 18(5):9–21, 2001.
  13. Checkerboard context model for efficient learned image compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14771–14780, 2021.
  14. ELIC: Efficient learned image compression with unevenly grouped space-channel contextual adaptive coding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5718–5727, 2022a.
  15. Po-elic: Perception-oriented efficient learned image coding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1764–1769, 2022b.
  16. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
  17. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  18. Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022a.
  19. Cascaded diffusion models for high fidelity image generation. The Journal of Machine Learning Research, 23(1):2249–2281, 2022b.
  20. Blurring diffusion models. arXiv preprint arXiv:2209.05557, 2022.
  21. Mlic: Multi-reference entropy model for learned image compression. In Proceedings of the 31st ACM International Conference on Multimedia, pages 7618–7627, 2023.
  22. Variational diffusion models. Advances in neural information processing systems, 34:21696–21707, 2021.
  23. Contextformer: A transformer with spatio-channel attention for context modeling in learned image compression. In European Conference on Computer Vision, pages 447–463. Springer, 2022.
  24. Context-adaptive entropy model for end-to-end optimized image compression. In International Conference on Learning Representations, 2018.
  25. Learned image compression with mixed transformer-cnn architectures. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14388–14397, 2023.
  26. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021.
  27. Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11461–11471, 2022.
  28. High-fidelity generative image compression. Advances in Neural Information Processing Systems, 33:11913–11924, 2020.
  29. Channel-wise autoregressive entropy models for learned image compression. In 2020 IEEE International Conference on Image Processing (ICIP), pages 3339–3343. IEEE, 2020.
  30. Joint autoregressive and hierarchical priors for learned image compression. Advances in neural information processing systems, 31, 2018.
  31. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pages 8162–8171. PMLR, 2021.
  32. Learning accurate entropy model with global reference for image compression. In International Conference on Learning Representations, 2020.
  33. Entroformer: A transformer-based entropy model for learned image compression. In International Conference on Learning Representations, 2021.
  34. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
  35. Generative modelling with inverse heat dissipation. arXiv preprint arXiv:2206.13397, 2022.
  36. Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):4713–4726, 2022.
  37. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pages 2256–2265. PMLR, 2015.
  38. Lossy image compression with compressive autoencoders. In International Conference on Learning Representations, 2016.
  39. Lossy compression with gaussian diffusion. arXiv preprint arXiv:2206.08889, 2022.
  40. Neural data-dependent transform for learned image compression. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, pages 17379–17388, 2022.
  41. Deblurring via stochastic refinement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16293–16303, 2022.
  42. Lossy image compression with conditional diffusion models. arXiv preprint arXiv:2209.06950, 2022.
  43. Perceptual learned video compression with recurrent conditional gan. arXiv preprint arXiv:2109.03082, 1, 2021.
  44. An introduction to neural data compression. Foundations and Trends® in Computer Graphics and Vision, 15(2):113–200, 2023.
  45. Frequency disentangled features in neural image compression. In 2023 IEEE International Conference on Image Processing (ICIP), pages 2815–2819. IEEE, 2023.
  46. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018.
  47. Transformer-based transform coding. In ICLR, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Atefeh Khoshkhahtinat (9 papers)
  2. Ali Zafari (22 papers)
  3. Piyush M. Mehta (20 papers)
  4. Nasser M. Nasrabadi (104 papers)
Youtube Logo Streamline Icon: https://streamlinehq.com