Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Photo-Realistic Image Restoration in the Wild with Controlled Vision-Language Models (2404.09732v1)

Published 15 Apr 2024 in cs.CV

Abstract: Though diffusion models have been successfully applied to various image restoration (IR) tasks, their performance is sensitive to the choice of training datasets. Typically, diffusion models trained in specific datasets fail to recover images that have out-of-distribution degradations. To address this problem, this work leverages a capable vision-LLM and a synthetic degradation pipeline to learn image restoration in the wild (wild IR). More specifically, all low-quality images are simulated with a synthetic degradation pipeline that contains multiple common degradations such as blur, resize, noise, and JPEG compression. Then we introduce robust training for a degradation-aware CLIP model to extract enriched image content features to assist high-quality image restoration. Our base diffusion model is the image restoration SDE (IR-SDE). Built upon it, we further present a posterior sampling strategy for fast noise-free image generation. We evaluate our model on both synthetic and real-world degradation datasets. Moreover, experiments on the unified image restoration task illustrate that the proposed posterior sampling improves image generation quality for various degradations.

Enhancing Real-World Image Restoration Using Vision-LLMs and Synthetic Degradation Pipelines

Overview of Research

This paper introduces a novel approach to tackle the challenge of real-world image restoration by leveraging a degradation-aware vision-LLM and a synthetic degradation pipeline. The objective is to improve the photo-realistic image restoration capabilities of diffusion models, particularly in scenarios involving out-of-distribution degradations. Key components of this research include the enhancement of a base diffusion model called IR-SDE, integration of a robust training strategy for a vision-LLM (DACLIP), and the development of a synthetic degradation pipeline to generate training data mimicking real-world imperfections.

Key Contributions

  1. Synthetic Degradation Pipeline:
    • This pipeline incorporates various common image degradations like blur, noise, resizing, and JPEG compression to create challenging training data.
    • A novel random shuffle strategy is employed, enhancing the model's ability to generalize across real-world degradations.
  2. Vision-LLM Integration:
    • The DACLIP model is trained to specifically recognize and respond to the nuances of degraded image content, facilitating more accurate restoration through enriched feature extraction.
    • Modifications to DACLIP enhance its capabilities by minimizing embedding distances between low and high-quality images, improving feature quality extracted from degraded inputs.
  3. Posterior Sampling in IR-SDE:
    • An innovative posterior sampling strategy is introduced, optimizing the reverse-time path used in the diffusion process to enhance the quality and speed of image restoration.

Experimental Validation

The effectiveness of these methodologies is confirmed through extensive testing on both synthetic and real-world datasets. The results indicate that the integrated approaches not merely achieve improvements in image quality but do so in a manner that is robust to a variety of real-world image degradations.

Implications and Future Directions

  • Theoretical Implications:
    • This work extends the theoretical understanding of diffusion models in complex, real-world scenarios, demonstrating that a combination of synthetic data and enhanced feature extraction models leads to significant improvements in restoration quality.
  • Practical Applications:
    • Practical applications abound in digital forensics, media restoration, and any field requiring the recovery or enhancement of visual information from degraded imagery. This system offers a more robust way of handling diverse and previously unseen image degradations in the wild.
  • Future Research Directions:
    • Further research could explore the application of these models to video restoration or expansion to other types of image-related tasks, such as object detection in degraded environments. Additionally, exploring the integration of more complex LLMs or more diverse degradation types could potentially lead to further enhancements in model performance.

Conclusion

By strategically incorporating a degradation-aware vision-LLM and a meticulously designed synthetic degradation pipeline, this research significantly advances the capabilities of diffusion-based image restoration systems. The innovative posterior sampling technique for the IR-SDE model specifically underscores the potential for such integrated approaches in addressing complex, real-world challenges in image restoration.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (71)
  1. Ntire 2017 challenge on single image super-resolution: Dataset and study. In CVPRW, pages 126–135, 2017.
  2. Brian DO Anderson. Reverse-time diffusion equation models. Stochastic Processes and their Applications, 12(3):313–326, 1982.
  3. Toward real-world single image super-resolution: A new benchmark and a new model. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3086–3095, 2019.
  4. Real-world blind super-resolution via feature matching with implicit high-resolution priors. In Proceedings of the 30th ACM International Conference on Multimedia, pages 1329–1338, 2022a.
  5. Simple baselines for image restoration. In European conference on computer vision, pages 17–33. Springer, 2022b.
  6. Hierarchical integration diffusion model for realistic image deblurring. Advances in Neural Information Processing Systems, 36, 2023.
  7. Diffusion posterior sampling for general noisy inverse problems. In The Eleventh International Conference on Learning Representations, 2022.
  8. Image quality assessment: Unifying structure and texture similarity. IEEE transactions on pattern analysis and machine intelligence, 44(5):2567–2581, 2020.
  9. Accelerating the super-resolution convolutional neural network. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14, pages 391–407. Springer, 2016.
  10. Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.
  11. Shadowdiffusion: When degradation prior meets diffusion model for shadow removal. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14049–14058, 2023.
  12. GANs trained by a two time-scale update rule converge to a local nash equilibrium. In Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2017.
  13. Perceptual losses for real-time style transfer and super-resolution. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14, pages 694–711. Springer, 2016.
  14. Denoising diffusion restoration models. Advances in Neural Information Processing Systems, 35:23593–23606, 2022.
  15. Deblurgan: Blind motion deblurring using conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8183–8192, 2018.
  16. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4681–4690, 2017.
  17. All-in-one image restoration for unknown corruption. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17452–17462, 2022a.
  18. Srdiff: Single image super-resolution with diffusion probabilistic models. Neurocomputing, 479:47–59, 2022b.
  19. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International conference on machine learning, pages 12888–12900. PMLR, 2022c.
  20. D2c-sr: A divergence to convergence approach for real-world image super-resolution. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XIX, pages 379–394. Springer, 2022d.
  21. LSDIR: A large scale dataset for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1775–1787, 2023.
  22. Kernel-aware burst blind super-resolution. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 4892–4902, 2023.
  23. Swinir: Image restoration using swin transformer. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1833–1844, 2021.
  24. NTIRE 2024 restore any image model (RAIM) in the wild challenge: Datasets, methods and results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop, 2024.
  25. Diffbir: Towards blind image restoration with generative diffusion prior. arXiv preprint arXiv:2308.15070, 2023.
  26. Visual instruction tuning. Advances in neural information processing systems, 36, 2024.
  27. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  28. Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11461–11471, 2022.
  29. Deep constrained least squares for blind image super-resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 17642–17652, 2022.
  30. Image restoration with mean-reverting stochastic differential equations. In International Conference on Machine Learning, pages 23045–23066. PMLR, 2023a.
  31. Refusion: Enabling large-size realistic image restoration with latent-space diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1680–1691, 2023b.
  32. Controlling vision-language models for universal image restoration. In The Twelfth International Conference on Learning Representations, 2024.
  33. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings of the 18th IEEE International Conference on Computer Vision (ICCV), pages 416–423. IEEE, 2001.
  34. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014.
  35. Making a “completely blind” image quality analyzer. IEEE Signal processing letters, 20(3):209–212, 2012.
  36. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pages 8162–8171. PMLR, 2021.
  37. Restoring vision in adverse weather conditions with patch-based denoising diffusion models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  38. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  39. Sdxl: Improving latent diffusion models for high-resolution image synthesis. In The Twelfth International Conference on Learning Representations, 2023.
  40. Promptir: Prompting for all-in-one blind image restoration. Advances in Neural Information Processing Systems (NeurIPS), 2023.
  41. Attentive generative adversarial network for raindrop removal from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2482–2491, 2018.
  42. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
  43. Multiscale structure guided diffusion for image deblurring. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10721–10733, 2023.
  44. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
  45. U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18, pages 234–241. Springer, 2015.
  46. Denoising diffusion probabilistic models for robust image super-resolution in the wild. arXiv preprint arXiv:2302.07864, 2023.
  47. Palette: Image-to-image diffusion models. In ACM SIGGRAPH 2022 conference proceedings, pages 1–10, 2022a.
  48. Image super-resolution via iterative refinement. IEEE transactions on pattern analysis and machine intelligence, 45(4):4713–4726, 2022b.
  49. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
  50. Coser: Bridging image and language for cognitive super-resolution. arXiv preprint arXiv:2311.16512, 2023.
  51. Exploiting diffusion prior for real-world image super-resolution. arXiv preprint arXiv:2305.07015, 2023.
  52. Esrgan: Enhanced super-resolution generative adversarial networks. In Proceedings of the European conference on computer vision (ECCV) workshops, pages 0–0, 2018.
  53. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1905–1914, 2021.
  54. Deep retinex decomposition for low-light enhancement. arXiv preprint arXiv:1808.04560, 2018.
  55. Deblurring via stochastic refinement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16293–16303, 2022.
  56. Seesr: Towards semantics-aware real-world image super-resolution. arXiv preprint arXiv:2311.16518, 2023.
  57. Scaling up to excellence: Practicing model scaling for photo-realistic image restoration in the wild. arXiv preprint arXiv:2401.13627, 2024.
  58. Image deblurring with blurred/noisy image pairs. ACM Transactions on Graphics (TOG), 26(3):1–es, 2007.
  59. Image restoration through generalized ornstein-uhlenbeck bridge. arXiv preprint arXiv:2312.10299, 2023a.
  60. Resshift: Efficient diffusion model for image super-resolution by residual shifting. Advances in Neural Information Processing Systems, 36, 2023b.
  61. Learning enriched features for real image restoration and enhancement. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV 16, pages 492–511. Springer, 2020.
  62. Multi-stage progressive image restoration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14821–14831, 2021.
  63. Learning deep cnn denoiser prior for image restoration. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3929–3938, 2017.
  64. Plug-and-play image restoration with deep denoiser prior. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):6360–6376, 2021a.
  65. Designing a practical degradation model for deep blind image super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4791–4800, 2021b.
  66. Adding conditional control to text-to-image diffusion models. arXiv preprint arXiv:2302.05543, 2023.
  67. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 586–595, 2018a.
  68. Entropy-regularized diffusion policy with q-ensembles for offline reinforcement learning. arXiv preprint arXiv:2402.04080, 2024.
  69. Residual dense network for image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2472–2481, 2018b.
  70. Recognize anything: A strong image tagging model. arXiv preprint arXiv:2306.03514, 2023.
  71. Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9):2337–2348, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Ziwei Luo (19 papers)
  2. Fredrik K. Gustafsson (17 papers)
  3. Zheng Zhao (69 papers)
  4. Jens Sjölund (42 papers)
  5. Thomas B. Schön (132 papers)
Citations (6)
X Twitter Logo Streamline Icon: https://streamlinehq.com