Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

InstructIR: High-Quality Image Restoration Following Human Instructions (2401.16468v5)

Published 29 Jan 2024 in cs.CV, cs.LG, and eess.IV

Abstract: Image restoration is a fundamental problem that involves recovering a high-quality clean image from its degraded observation. All-In-One image restoration models can effectively restore images from various types and levels of degradation using degradation-specific information as prompts to guide the restoration model. In this work, we present the first approach that uses human-written instructions to guide the image restoration model. Given natural language prompts, our model can recover high-quality images from their degraded counterparts, considering multiple degradation types. Our method, InstructIR, achieves state-of-the-art results on several restoration tasks including image denoising, deraining, deblurring, dehazing, and (low-light) image enhancement. InstructIR improves +1dB over previous all-in-one restoration methods. Moreover, our dataset and results represent a novel benchmark for new research on text-guided image restoration and enhancement. Our code, datasets and models are available at: https://github.com/mv-lab/InstructIR

Definition Search Book Streamline Icon: https://streamlinehq.com
References (107)
  1. NTIRE 2017 challenge on single image super-resolution: Dataset and study. In CVPR Workshops, 2017.
  2. Contour detection and hierarchical image segmentation. TPAMI, 2011.
  3. Textir: A simple framework for text-based editable image restoration. CoRR, abs/2302.14736, 2023.
  4. Instructpix2pix: Learning to follow image editing instructions. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023, pages 18392–18402. IEEE, 2023.
  5. Learning photographic global tonal adjustment with a database of input / output image pairs. In The Twenty-Fourth IEEE Conference on Computer Vision and Pattern Recognition, 2011.
  6. Dehazenet: An end-to-end system for single image haze removal. IEEE Transactions on Image Processing, 25(11):5187–5198, 2016.
  7. Gated context aggregation network for image dehazing and deraining. In 2019 IEEE winter conference on applications of computer vision (WACV), pages 1375–1383. IEEE, 2019.
  8. Pre-trained image processing transformer. In CVPR, 2021a.
  9. Hinet: Half instance normalization network for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 182–192, 2021b.
  10. Simple baselines for image restoration. In ECCV, 2022.
  11. Deep photo enhancer: Unpaired learning for image enhancement from photographs with gans. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6306–6314, 2018.
  12. Blind image super-resolution with spatially variant degradations. ACM Transactions on Graphics (TOG), 38(6):1–13, 2019.
  13. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pages 4171–4186. Association for Computational Linguistics, 2019.
  14. Mitigating task interference in multi-task learning via explicit task routing with non-learnable primitives. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7756–7765, 2023.
  15. Image super-resolution using deep convolutional networks. TPAMI, 2015.
  16. Multi-scale boosted dehazing network with dense feature fusion. In CVPR, 2020a.
  17. Multi-scale boosted dehazing network with dense feature fusion. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2157–2167, 2020b.
  18. Image deblurring and super-resolution by adaptive sparse domain selection and adaptive regularization. TIP, 2011.
  19. Fd-gan: Generative adversarial networks with fusion-discriminator for single image dehazing. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 10729–10736, 2020c.
  20. Restoration of a single superresolution image from several blurred, noisy, and undersampled measured images. IEEE transactions on image processing, 6(12):1646–1658, 1997.
  21. A general decoupled learning framework for parameterized image operators. IEEE transactions on pattern analysis and machine intelligence, 43(1):33–47, 2019.
  22. Rich Franzen. Kodak lossless true color image suite. http://r0k.us/graphics/kodak/, 1999. Online accessed 24 Oct 2021.
  23. A fusion-based enhancing method for weakly illuminated images. 129:82–96, 2016a.
  24. A weighted variational model for simultaneous reflectance and illumination estimation. In CVPR, 2016b.
  25. Dynamic scene deblurring with parameter selective sharing and nested skip connections. In CVPR, pages 3848–3856, 2019.
  26. Deep bilateral learning for real-time image enhancement. ACM Transactions on Graphics (TOG), 36(4):1–12, 2017.
  27. Lime: Low-light image enhancement via illumination map estimation. IEEE TIP, 26(2):982–993, 2016.
  28. Low-light image enhancement with semi-decoupled decomposition. IEEE TMM, 22(12):3025–3038, 2020.
  29. Single image haze removal using dark channel prior. TPAMI, 2010.
  30. Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626, 2022.
  31. Searching for mobilenetv3. In ICCV, 2019.
  32. Single image super-resolution from transformed self-exemplars. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5197–5206, 2015.
  33. Enlightengan: Deep light enhancement without paired supervision. IEEE TIP, 30:2340–249, 2021.
  34. Imagic: Text-based real image editing with diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6007–6017, 2023.
  35. Single-image super-resolution using sparse regression and natural image prior. TPAMI, 2010.
  36. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014.
  37. Deep photo: Model-based photograph enhancement and viewing. ACM TOG, 2008.
  38. DeblurGAN: Blind motion deblurring using conditional adversarial networks. In CVPR, 2018.
  39. DeblurGAN-v2: Deblurring (orders-of-magnitude) faster and better. In ICCV, 2019.
  40. Low-light image enhancement using the cell vibration model. IEEE TMM, pages 1–1, 2022.
  41. Benchmarking single-image dehazing and beyond. IEEE Transactions on Image Processing, 28(1):492–505, 2018.
  42. All-in-one image restoration for unknown corruption. In CVPR, pages 17452–17462, 2022.
  43. Luminance-aware pyramid network for low-light image enhancement. IEEE TMM, 23:3153–3165, 2020.
  44. SwinIR: Image restoration using swin transformer. In ICCV Workshops, 2021.
  45. Tape: Task-agnostic prior embedding for image restoration. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XVIII, pages 447–464. Springer, 2022.
  46. Retinex-inspired unrolling with cooperative prior architecture search for low-light image enhancement. In CVPR, 2021.
  47. Dual residual networks leveraging the potential of paired operations for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7007–7016, 2019.
  48. Learning the degradation distribution for blind image super-resolution. In CVPR, pages 6063–6072, 2022.
  49. Prores: Exploring degradation-aware visual prompt for universal image restoration. arXiv preprint arXiv:2306.13653, 2023.
  50. Waterloo exploration database: New challenges for image quality assessment models. TIP, 2016.
  51. Toward fast, flexible, and robust low-light image enhancement. In CVPR, 2022.
  52. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In ICCV, 2001.
  53. Sdedit: Guided image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073, 2021.
  54. Nonparametric blind super-resolution. In ICCV, 2013.
  55. Deeplpf: Deep local parametric filters for image enhancement. In CVPR, 2020.
  56. Deep generalized unfolding networks for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17399–17410, 2022.
  57. Deep multi-scale convolutional neural network for dynamic scene deblurring. In CVPR, 2017.
  58. Clean images are hard to reblur: Exploiting the ill-posed inverse task for dynamic scene deblurring. In ICLR, 2022.
  59. Efficient generalized cross-validation with applications to parametric image restoration and resolution enhancement. IEEE Transactions on image processing, 10(9):1299–1308, 2001.
  60. All-in-one image restoration for unknown degradations using adaptive discriminative filters for specific degradations. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5815–5824. IEEE, 2023.
  61. Promptir: Prompting for all-in-one blind image restoration. arXiv preprint arXiv:2306.13090, 2023.
  62. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, pages 8748–8763. PMLR, 2021.
  63. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pages 3980–3990. Association for Computational Linguistics, 2019.
  64. Adaptive consistency prior based deep network for image denoising. In CVPR, 2021.
  65. Gated fusion network for single image dehazing. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3253–3261, 2018.
  66. Single image dehazing via multi-scale convolutional neural networks with holistic edges. IJCV, 2020.
  67. High-resolution image synthesis with latent diffusion models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pages 10674–10685. IEEE, 2022.
  68. U-Net: convolutional networks for biomedical image segmentation. In MICCAI, 2015.
  69. Routing networks: Adaptive selection of non-linear functions for multi-task learning. arXiv preprint arXiv:1711.01239, 2017.
  70. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22500–22510, 2023.
  71. Many task learning with task routing. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1375–1384, 2019.
  72. Image denoising using deep cnn with batch renormalization. Neural Networks, 2020.
  73. Anchored neighborhood regression for fast example-based super-resolution. In ICCV, 2013.
  74. MAXIM: Multi-axis MLP for image processing. In CVPR, pages 5769–5780, 2022.
  75. Transweather: Transformer-based restoration of images degraded by adverse weather conditions. In CVPR, pages 2353–2363, 2022.
  76. Attention is all you need. In NeurIPS, 2017.
  77. Underexposed photo enhancement using deep illumination estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6849–6857, 2019.
  78. Naturalness preserved enhancement algorithm for non-uniform illumination images. IEEE TIP, 22(9):3538–3548, 2013.
  79. Non-local neural networks. In CVPR, 2018a.
  80. ESRGAN: enhanced super-resolution generative adversarial networks. In ECCV Workshops, 2018b.
  81. Low-light image enhancement with illumination-aware gamma correction and complete image modelling network. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13128–13137, 2023.
  82. Uformer: A general u-shaped transformer for image restoration. arXiv:2106.03106, 2021.
  83. Deep retinex decomposition for low-light enhancement. In British Machine Vision Conference, 2018.
  84. Uretinex-net: Retinex-based deep unfolding network for low-light image enhancement. In CVPR, 2022.
  85. C-pack: Packaged resources to advance general chinese embedding. CoRR, abs/2309.07597, 2023.
  86. Learning to restore low-light images via decomposition-and-enhancement. In CVPR, 2020.
  87. Unnatural l0 sparse representation for natural image deblurring. In CVPR, 2013.
  88. Learning texture transformer network for image super-resolution. In CVPR, 2020.
  89. Band representation-based semi-supervised low-light image enhancement: bridging the gap between signal fidelity and perceptual quality. IEEE TIP, 30:3461–3473, 2021a.
  90. Sparse gradient regularized deep retinex network for robust low-light image enhancement. IEEE TIP, 30:2072–2086, 2021b.
  91. Neural degradation representation learning for all-in-one image restoration. arXiv preprint arXiv:2310.12848, 2023.
  92. Metaformer is actually what you need for vision. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10819–10829, 2022.
  93. Learning enriched features for real image restoration and enhancement. In ECCV, 2020.
  94. Multi-stage progressive image restoration. In CVPR, 2021.
  95. Restormer: Efficient transformer for high-resolution image restoration. In CVPR, 2022.
  96. Learning image-adaptive 3d lookup tables for high performance photo enhancement in real-time. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(4):2058–2073, 2020.
  97. All-in-one multi-degradation image restoration network via hierarchical degradation representation. arXiv preprint arXiv:2308.03021, 2023a.
  98. All-in-one multi-degradation image restoration network via hierarchical degradation representation. In Proceedings of the 31st ACM International Conference on Multimedia, pages 2285–2293, 2023b.
  99. Dynamic scene deblurring using spatially variant recurrent neural networks. In CVPR, 2018a.
  100. Ingredient-oriented multi-degradation learning for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5825–5835, 2023c.
  101. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. TIP, 2017a.
  102. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE transactions on image processing, 26(7):3142–3155, 2017b.
  103. Learning deep CNN denoiser prior for image restoration. In CVPR, 2017c.
  104. FFDNet: Toward a fast and flexible solution for CNN-based image denoising. TIP, 2018b.
  105. Deblurring by realistic blurring. In CVPR, pages 2737–2746, 2020.
  106. Designing a practical degradation model for deep blind image super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4791–4800, 2021.
  107. Kindling the darkness: A practical low-light image enhancer. In ACM MM, 2019.
Citations (25)

Summary

  • The paper introduces InstructIR, a novel system that uses human-written instructions to guide versatile image restoration tasks.
  • It employs a language-informed algorithm with a sentence transformer and over 10,000 GPT-4 generated prompts to address degradations like denoising and deblurring.
  • Empirical results show a +1dB improvement over previous methods, demonstrating significant performance gains and enhanced user-guided restoration.

Overview

This paper introduces InstructIR, an innovative approach to image restoration that leverages human-written instructions as a guiding mechanism. Unlike traditional models that either address specific types of degradations or handle various degradations through pre-defined guidance vectors, InstructIR takes advantage of natural language processing to understand and execute restoration tasks described in natural language instructions. Through a robust set of experiments, the researchers validate the efficacy of employing text guidance for image restoration, with InstructIR setting new benchmarks across multiple restoration tasks.

Methodology

The work explores the intersection of image restoration and instruction-based guidance. The authors propose a language-informed algorithm that can interpret human-written instructions to perform complex restoration tasks on degraded images. At the core of InstructIR is the usage of a text encoder—such as a sentence transformer—that captures the semantics of user prompts and translates them into an embedding space that the image restoration model can understand.

The research makes a significant contribution by demonstrating that a single InstructIR model, powered by NAFNet's efficient architecture, can simultaneously address various restoration tasks, such as denoising, deraining, deblurring, dehazing, and low-light enhancement. The underlying method treats instruction-based image restoration as a supervised learning problem, where over 10,000 diverse prompts are first generated using GPT-4 and paired with corresponding degraded images to form a robust training dataset.

Results

Empirical results indicate that InstructIR surpasses state-of-the-art benchmarks on different image restoration tasks. An improvement of +1dB over previous all-in-one restoration methods is reported, demonstrating the model's ability to process complex, multi-degradation problems effectively. InstructIR's flexibility is also showcased as it caters to the restoration needs prescribed explicitly by end-users through arbitrary instructions.

Implications and Conclusion

The significance of InstructIR lies not only in its performance but also in the paradigm shift it introduces in user interaction with restoration models. The model interprets a vast range of instructions, offering an intuitive interface for non-experts to achieve desired restoration outcomes. By releasing the dataset and articulating a new benchmark for text-guided image restoration, this research paves the way for subsequent exploration and development in the area.

In conclusion, the paper describes a critical advance in leveraging human guidance via natural language prompts to facilitate the challenging task of image restoration. By demonstrating remarkable performance across several benchmark tasks, InstructIR exemplifies the promising synthesis of language understanding and visual data processing, heralding a future where AI-driven image restoration becomes more accessible and user-friendly.