Papers
Topics
Authors
Recent
Search
2000 character limit reached

DriftRec: Adapting diffusion models to blind JPEG restoration

Published 12 Nov 2022 in eess.IV, cs.CV, and cs.LG | (2211.06757v3)

Abstract: In this work, we utilize the high-fidelity generation abilities of diffusion models to solve blind JPEG restoration at high compression levels. We propose an elegant modification of the forward stochastic differential equation of diffusion models to adapt them to this restoration task and name our method DriftRec. Comparing DriftRec against an $L_2$ regression baseline with the same network architecture and state-of-the-art techniques for JPEG restoration, we show that our approach can escape the tendency of other methods to generate blurry images, and recovers the distribution of clean images significantly more faithfully. For this, only a dataset of clean/corrupted image pairs and no knowledge about the corruption operation is required, enabling wider applicability to other restoration tasks. In contrast to other conditional and unconditional diffusion models, we utilize the idea that the distributions of clean and corrupted images are much closer to each other than each is to the usual Gaussian prior of the reverse process in diffusion models. Our approach therefore requires only low levels of added noise and needs comparatively few sampling steps even without further optimizations. We show that DriftRec naturally generalizes to realistic and difficult scenarios such as unaligned double JPEG compression and blind restoration of JPEGs found online, without having encountered such examples during training.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. P. Dhariwal and A. Nichol, “Diffusion models beat GANs on image synthesis,” Advances in Neural Inf. Proc. Systems (NeurIPS), vol. 34, 2021.
  2. A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen, “Hierarchical text-conditional image generation with CLIP latents,” arXiv preprint arXiv:2204.06125, 2022.
  3. R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 10 684–10 695.
  4. Z. Kong, W. Ping, J. Huang, K. Zhao, and B. Catanzaro, “DiffWave: A versatile diffusion model for audio synthesis,” Int. Conf. on Learning Representations (ICLR), 2021.
  5. J. Richter, S. Welker, J.-M. Lemercier, B. Lay, and T. Gerkmann, “Speech enhancement and dereverberation with diffusion-based generative models,” IEEE Trans. on Audio, Speech, and Language Proc. (TASLP), 2023.
  6. R. Huang, M. W. Y. Lam, J. Wang, D. Su, D. Yu, Y. Ren, and Z. Zhao, “FastDiff: A fast conditional diffusion model for high-quality speech synthesis,” in IJCAI, 2022.
  7. R. Sheffer and Y. Adi, “I hear your true colors: Image guided audio generation,” in IEEE Int. Conf. on Acoustics, Speech and Signal Proc. (ICASSP).   IEEE, 2023, pp. 1–5.
  8. A. Bansal, E. Borgnia, H.-M. Chu, J. S. Li, H. Kazemi, F. Huang, M. Goldblum, J. Geiping, and T. Goldstein, “Cold diffusion: Inverting arbitrary image transforms without noise,” arXiv preprint arXiv:2208.09392, 2022.
  9. G. Daras, M. Delbracio, H. Talebi, A. Dimakis, and P. Milanfar, “Soft diffusion: Score matching with general corruptions,” Transactions on Machine Learning Research, 2023.
  10. B. Kawar, M. Elad, S. Ermon, and J. Song, “Denoising diffusion restoration models,” in NeurIPS, 2022.
  11. Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differential equations,” Int. Conf. on Learning Representations (ICLR), 2021.
  12. S. Welker, J. Richter, and T. Gerkmann, “Speech enhancement with score-based generative models in the complex STFT domain,” ISCA Interspeech, 2022.
  13. S. Welker, H. N. Chapman, and T. Gerkmann, “Blind drifting: Diffusion models with a linear SDE drift term for blind image restoration tasks,” in The Symbiosis of Deep Learning and Differential Equations II, 2022.
  14. B. Kawar, J. Song, S. Ermon, and M. Elad, “JPEG artifact correction using Denoising Diffusion Restoration Models,” in NeurIPS 2022 Workshop on Score-Based Methods, 2022.
  15. J. Jiang, K. Zhang, and R. Timofte, “Towards flexible blind JPEG artifacts removal,” in ICCV, 2021, pp. 4997–5006.
  16. M. Ehrlich, L. Davis, S.-N. Lim, and A. Shrivastava, “Quantization guided JPEG artifact correction,” in ECCV.   Springer, 2020, pp. 293–309.
  17. C. Saharia, W. Chan, H. Chang, C. Lee, J. Ho, T. Salimans, D. Fleet, and M. Norouzi, “Palette: Image-to-image diffusion models,” in ACM SIGGRAPH 2022 Conference Proceedings, 2022, pp. 1–10.
  18. Z. Luo, F. K. Gustafsson, Z. Zhao, J. Sjölund, and T. B. Schön, “Image restoration with mean-reverting stochastic differential equations,” in Int. Conf. on Machine Learning (ICML), 2023.
  19. J. Richter, G. Carbajal, and T. Gerkmann, “Speech enhancement with stochastic temporal convolutional networks.” ISCA Interspeech, pp. 4516–4520, 2020.
  20. B. D. O. Anderson, “Reverse-time diffusion equation models,” Stochastic Processes and their Applications, vol. 12, no. 3, pp. 313–326, 1982.
  21. C. Meng, Y. He, Y. Song, J. Song, J. Wu, J.-Y. Zhu, and S. Ermon, “SDEdit: Guided image synthesis and editing with stochastic differential equations,” in Int. Conf. on Learning Representations (ICLR), 2022.
  22. Y. Song and S. Ermon, “Generative modeling by estimating gradients of the data distribution,” Advances in Neural Inf. Proc. Systems (NeurIPS), vol. 32, 2019.
  23. J. Berner, L. Richter, and K. Ullrich, “An optimal control perspective on diffusion-based generative modeling,” in NeurIPS 2022 Workshop on Score-Based Methods, 2022.
  24. P. Vincent, “A connection between score matching and denoising autoencoders,” Neural Computation, vol. 23, no. 7, pp. 1661–1674, 2011.
  25. T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of GANs for improved quality, stability, and variation,” in Int. Conf. on Learning Representations (ICLR), 2018.
  26. E. Agustsson and R. Timofte, “NTIRE 2017 Challenge on single image super-resolution: Dataset and study,” in CVPRW, July 2017.
  27. R. Timofte, E. Agustsson, L. Van Gool, M.-H. Yang, L. Zhang, B. Lim et al., “NTIRE 2017 Challenge on single image super-resolution: Methods and results,” in CVPRW, July 2017.
  28. H. R. Sheikh, M. F. Sabir, and A. C. Bovik, “A statistical evaluation of recent full reference image quality assessment algorithms,” IEEE Transactions on image processing, vol. 15, no. 11, pp. 3440–3451, 2006.
  29. A. Clark, “Pillow (PIL fork) documentation,” 2015. [Online]. Available: https://buildmedia.readthedocs.org/media/pdf/pillow/latest/pillow.pdf
  30. I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” in Int. Conf. on Learning Representations (ICLR), 2019.
  31. M. Bińkowski, D. J. Sutherland, M. Arbel, and A. Gretton, “Demystifying MMD GANs,” in Int. Conf. on Learning Representations (ICLR), 2018.
  32. M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “GANs trained by a two time-scale update rule converge to a local Nash equilibrium,” in Advances in Neural Inf. Proc. Systems (NeurIPS), vol. 30, 2017.
  33. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. on Image Proc., vol. 13, no. 4, pp. 600–612, 2004.
  34. R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 586–595.
  35. S. Van der Walt, J. L. Schönberger, J. Nunez-Iglesias, F. Boulogne, J. D. Warner, N. Yager, E. Gouillart, and T. Yu, “scikit-image: image processing in Python,” PeerJ, vol. 2, p. e453, 2014.
  36. S. Kastryulin, J. Zakirov, D. Prokopenko, and D. V. Dylov, “PyTorch image quality: Metrics for image quality assessment,” arXiv preprint arXiv:2208.14818, 2022.
  37. R. Dahl, M. Norouzi, and J. Shlens, “Pixel recursive super resolution,” in ICCV, 2017, pp. 5439–5448.
  38. C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang et al., “Photo-realistic single image super-resolution using a generative adversarial network,” in IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 4681–4690.
  39. J. C. Mier, E. Huang, H. Talebi, F. Yang, and P. Milanfar, “Deep perceptual image quality assessment for compression,” in ICIP.   IEEE, 2021, pp. 1484–1488.
  40. Y. Blau and T. Michaeli, “The perception-distortion tradeoff,” in IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), June 2018.
  41. T. Tadala and S. E. V. Narayana, “A novel PSNR-B approach for evaluating the quality of deblocked images,” IOSR Journal of Computer Engineering, vol. 4, no. 5, pp. 40–49, 2012.
  42. A. Roberts, “Modify the improved Euler scheme to integrate stochastic differential equations,” arXiv preprint arXiv:1210.0933, 2012.
  43. J. R. Dormand and P. J. Prince, “A family of embedded runge-kutta formulae,” Journal of computational and applied mathematics, vol. 6, no. 1, pp. 19–26, 1980.
  44. Q. Zhang and Y. Chen, “Fast sampling of diffusion models with exponential integrator,” in NeurIPS 2022 Workshop on Score-Based Methods, 2022.
  45. T. Karras, M. Aittala, T. Aila, and S. Laine, “Elucidating the design space of diffusion-based generative models,” in Advances in Neural Inf. Proc. Systems (NeurIPS), 2022.
  46. T. Salimans and J. Ho, “Progressive distillation for fast sampling of diffusion models,” in Int. Conf. on Learning Representations (ICLR), 2022.
  47. D. Watson, W. Chan, J. Ho, and M. Norouzi, “Learning fast samplers for diffusion models by differentiating through sample quality,” in Int. Conf. on Learning Representations (ICLR), 2021.
  48. J.-M. Lemercier, J. Richter, S. Welker, and T. Gerkmann, “StoRM: A diffusion-based stochastic regeneration model for speech enhancement and dereverberation,” IEEE Trans. on Audio, Speech, and Language Proc. (TASLP), 2023.
  49. B. Lay, S. Welker, J. Richter, and T. Gerkmann, “Reducing the prior mismatch of stochastic differential equations for diffusion-based speech enhancement,” ISCA Interspeech, 2023.
  50. Y. Song, P. Dhariwal, M. Chen, and I. Sutskever, “Consistency models,” Int. Conf. on Machine Learning (ICML), 2023.
Citations (6)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.