Fine color guidance in diffusion models and its application to image compression at extremely low bitrates (2404.06865v1)
Abstract: This study addresses the challenge of, without training or fine-tuning, controlling the global color aspect of images generated with a diffusion model. We rewrite the guidance equations to ensure that the outputs are closer to a known color map, and this without hindering the quality of the generation. Our method leads to new guidance equations. We show in the color guidance context that, the scaling of the guidance should not decrease but remains high throughout the diffusion process. In a second contribution, our guidance is applied in a compression framework, we combine both semantic and general color information on the image to decode the images at low cost. We show that our method is effective at improving fidelity and realism of compressed images at extremely low bit rates, when compared to other classical or more semantic oriented approaches.
- Y. Song and S. Ermon, “Generative modeling by estimating gradients of the data distribution,” Advances in neural information processing systems, vol. 32, 2019.
- R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-Resolution Image Synthesis with Latent Diffusion Models,” Apr. 2022, arXiv:2112.10752 [cs]. [Online]. Available: http://arxiv.org/abs/2112.10752
- D. You, A. Floros, and P. L. Dragotti, “Indigo: An inn-guided probabilistic diffusion algorithm for inverse problems,” arXiv preprint arXiv:2306.02949, 2023.
- B. Kawar, M. Elad, S. Ermon, and J. Song, “Denoising diffusion restoration models,” Advances in Neural Information Processing Systems, vol. 35, pp. 23 593–23 606, 2022.
- M. Kwon, J. Jeong, and Y. Uh, “Diffusion models already have a semantic latent space,” arXiv preprint arXiv:2210.10960, 2022.
- L. Zhang, A. Rao, and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 3836–3847.
- C. Meng, Y. He, Y. Song, J. Song, J. Wu, J.-Y. Zhu, and S. Ermon, “Sdedit: Guided image synthesis and editing with stochastic differential equations,” arXiv preprint arXiv:2108.01073, 2021.
- P. Dhariwal and A. Nichol, “Diffusion Models Beat GANs on Image Synthesis,” in Advances in Neural Information Processing Systems, vol. 34. Curran Associates, Inc., 2021, pp. 8780–8794. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2021/hash/49ad23d1ec9fa4bd8d77d02681df5cfa-Abstract.html
- A. Bansal, H.-M. Chu, A. Schwarzschild, S. Sengupta, M. Goldblum, J. Geiping, and T. Goldstein, “Universal guidance for diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 843–852.
- Y. Blau and T. Michaeli, “The perception-distortion tradeoff,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 6228–6237.
- ——, “Rethinking lossy compression: The rate-distortion-perception tradeoff,” in International Conference on Machine Learning. PMLR, 2019, pp. 675–685.
- F. Pezone, O. Musa, G. Caire, and S. Barbarossa, “Semantic-preserving image coding based on conditional diffusion models,” arXiv preprint arXiv:2310.15737, 2023.
- T. Bordin and T. Maugey, “Semantic based generative compression of images for extremely low bitrates,” in 2023 IEEE 25th International Workshop on Multimedia Signal Processing (MMSP). IEEE, 2023, pp. 1–6.
- T. Bordin, T. Bachard, and T. Maugey, “Coclico: Extremely low bitrate image compression based on clip semantic and tiny color map,” in 2024 IEEE 37th International Workshop of Picture Coding Symposium (PCS). IEEE, 2024.
- E. Lei, Y. B. Uslu, H. Hassani, and S. S. Bidokhti, “Text+ sketch: Image compression at ultra low rates,” arXiv preprint arXiv:2307.01944, 2023.
- A. Wieckowski, J. Brandenburg, T. Hinz, C. Bartnik, V. George, G. Hege, C. Helmrich, A. Henkel, C. Lehmann, C. Stoffers, I. Zupancic, B. Bross, and D. Marpe, “Vvenc: An open and optimized vvc encoder implementation,” in Proc. IEEE International Conference on Multimedia Expo Workshops (ICMEW), pp. 1–2.
- J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” arXiv preprint arXiv:2010.02502, 2020.
- A. Lugmayr, M. Danelljan, A. Romero, F. Yu, R. Timofte, and L. Van Gool, “Repaint: Inpainting using denoising diffusion probabilistic models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 11 461–11 471.
- J. Ho and T. Salimans, “Classifier-free diffusion guidance,” arXiv preprint arXiv:2207.12598, 2022.
- Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differential equations,” arXiv preprint arXiv:2011.13456, 2020.
- J. Betker, G. Goh, L. Jing, T. Brooks, J. Wang, L. Li, L. Ouyang, J. Zhuang, J. Lee, Y. Guo et al., “Improving image generation with better captions,” Computer Science. https://cdn. openai. com/papers/dall-e-3. pdf, vol. 2, no. 3, p. 8, 2023.
- E. Agustsson, D. Minnen, G. Toderici, and F. Mentzer, “Multi-realism image compression with a conditional generator,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 22 324–22 333.
- Z. Pan, X. Zhou, and H. Tian, “Extreme generative image compression by learning text embedding from diffusion models,” arXiv preprint arXiv:2211.07793, 2022.
- E. Grassucci, S. Barbarossa, and D. Comminiello, “Generative semantic communication: Diffusion models beyond bit recovery,” arXiv preprint arXiv:2306.04321, 2023.
- A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in International conference on machine learning. PMLR, 2021, pp. 8748–8763.
- P. von Platen, S. Patil, A. Lozhkov, P. Cuenca, N. Lambert, K. Rasul, M. Davaadorj, and T. Wolf, “Diffusers: State-of-the-art diffusion models,” https://github.com/huggingface/diffusers, 2022.
- C. Lu, Y. Zhou, F. Bao, J. Chen, C. Li, and J. Zhu, “Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models,” arXiv preprint arXiv:2211.01095, 2022.
- P. Esser, R. Rombach, and B. Ommer, “Taming transformers for high-resolution image synthesis,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 12 873–12 883.
- “Kodak: The kodak color image dataset,” http://r0k.us/graphics/kodak/.
- J. Wang, K. C. Chan, and C. C. Loy, “Exploring clip for assessing the look and feel of images,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 2, 2023, pp. 2555–2563.
- W. Zhang, K. Ma, J. Yan, D. Deng, and Z. Wang, “Blind image quality assessment using a deep bilinear convolutional neural network,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 1, pp. 36–47, 2018.