InstructIR: High-Quality Image Restoration Following Human Instructions (2401.16468v5)
Abstract: Image restoration is a fundamental problem that involves recovering a high-quality clean image from its degraded observation. All-In-One image restoration models can effectively restore images from various types and levels of degradation using degradation-specific information as prompts to guide the restoration model. In this work, we present the first approach that uses human-written instructions to guide the image restoration model. Given natural language prompts, our model can recover high-quality images from their degraded counterparts, considering multiple degradation types. Our method, InstructIR, achieves state-of-the-art results on several restoration tasks including image denoising, deraining, deblurring, dehazing, and (low-light) image enhancement. InstructIR improves +1dB over previous all-in-one restoration methods. Moreover, our dataset and results represent a novel benchmark for new research on text-guided image restoration and enhancement. Our code, datasets and models are available at: https://github.com/mv-lab/InstructIR
- NTIRE 2017 challenge on single image super-resolution: Dataset and study. In CVPR Workshops, 2017.
- Contour detection and hierarchical image segmentation. TPAMI, 2011.
- Textir: A simple framework for text-based editable image restoration. CoRR, abs/2302.14736, 2023.
- Instructpix2pix: Learning to follow image editing instructions. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023, pages 18392–18402. IEEE, 2023.
- Learning photographic global tonal adjustment with a database of input / output image pairs. In The Twenty-Fourth IEEE Conference on Computer Vision and Pattern Recognition, 2011.
- Dehazenet: An end-to-end system for single image haze removal. IEEE Transactions on Image Processing, 25(11):5187–5198, 2016.
- Gated context aggregation network for image dehazing and deraining. In 2019 IEEE winter conference on applications of computer vision (WACV), pages 1375–1383. IEEE, 2019.
- Pre-trained image processing transformer. In CVPR, 2021a.
- Hinet: Half instance normalization network for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 182–192, 2021b.
- Simple baselines for image restoration. In ECCV, 2022.
- Deep photo enhancer: Unpaired learning for image enhancement from photographs with gans. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6306–6314, 2018.
- Blind image super-resolution with spatially variant degradations. ACM Transactions on Graphics (TOG), 38(6):1–13, 2019.
- BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pages 4171–4186. Association for Computational Linguistics, 2019.
- Mitigating task interference in multi-task learning via explicit task routing with non-learnable primitives. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7756–7765, 2023.
- Image super-resolution using deep convolutional networks. TPAMI, 2015.
- Multi-scale boosted dehazing network with dense feature fusion. In CVPR, 2020a.
- Multi-scale boosted dehazing network with dense feature fusion. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2157–2167, 2020b.
- Image deblurring and super-resolution by adaptive sparse domain selection and adaptive regularization. TIP, 2011.
- Fd-gan: Generative adversarial networks with fusion-discriminator for single image dehazing. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 10729–10736, 2020c.
- Restoration of a single superresolution image from several blurred, noisy, and undersampled measured images. IEEE transactions on image processing, 6(12):1646–1658, 1997.
- A general decoupled learning framework for parameterized image operators. IEEE transactions on pattern analysis and machine intelligence, 43(1):33–47, 2019.
- Rich Franzen. Kodak lossless true color image suite. http://r0k.us/graphics/kodak/, 1999. Online accessed 24 Oct 2021.
- A fusion-based enhancing method for weakly illuminated images. 129:82–96, 2016a.
- A weighted variational model for simultaneous reflectance and illumination estimation. In CVPR, 2016b.
- Dynamic scene deblurring with parameter selective sharing and nested skip connections. In CVPR, pages 3848–3856, 2019.
- Deep bilateral learning for real-time image enhancement. ACM Transactions on Graphics (TOG), 36(4):1–12, 2017.
- Lime: Low-light image enhancement via illumination map estimation. IEEE TIP, 26(2):982–993, 2016.
- Low-light image enhancement with semi-decoupled decomposition. IEEE TMM, 22(12):3025–3038, 2020.
- Single image haze removal using dark channel prior. TPAMI, 2010.
- Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626, 2022.
- Searching for mobilenetv3. In ICCV, 2019.
- Single image super-resolution from transformed self-exemplars. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5197–5206, 2015.
- Enlightengan: Deep light enhancement without paired supervision. IEEE TIP, 30:2340–249, 2021.
- Imagic: Text-based real image editing with diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6007–6017, 2023.
- Single-image super-resolution using sparse regression and natural image prior. TPAMI, 2010.
- Adam: A method for stochastic optimization. arXiv:1412.6980, 2014.
- Deep photo: Model-based photograph enhancement and viewing. ACM TOG, 2008.
- DeblurGAN: Blind motion deblurring using conditional adversarial networks. In CVPR, 2018.
- DeblurGAN-v2: Deblurring (orders-of-magnitude) faster and better. In ICCV, 2019.
- Low-light image enhancement using the cell vibration model. IEEE TMM, pages 1–1, 2022.
- Benchmarking single-image dehazing and beyond. IEEE Transactions on Image Processing, 28(1):492–505, 2018.
- All-in-one image restoration for unknown corruption. In CVPR, pages 17452–17462, 2022.
- Luminance-aware pyramid network for low-light image enhancement. IEEE TMM, 23:3153–3165, 2020.
- SwinIR: Image restoration using swin transformer. In ICCV Workshops, 2021.
- Tape: Task-agnostic prior embedding for image restoration. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XVIII, pages 447–464. Springer, 2022.
- Retinex-inspired unrolling with cooperative prior architecture search for low-light image enhancement. In CVPR, 2021.
- Dual residual networks leveraging the potential of paired operations for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7007–7016, 2019.
- Learning the degradation distribution for blind image super-resolution. In CVPR, pages 6063–6072, 2022.
- Prores: Exploring degradation-aware visual prompt for universal image restoration. arXiv preprint arXiv:2306.13653, 2023.
- Waterloo exploration database: New challenges for image quality assessment models. TIP, 2016.
- Toward fast, flexible, and robust low-light image enhancement. In CVPR, 2022.
- A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In ICCV, 2001.
- Sdedit: Guided image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073, 2021.
- Nonparametric blind super-resolution. In ICCV, 2013.
- Deeplpf: Deep local parametric filters for image enhancement. In CVPR, 2020.
- Deep generalized unfolding networks for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17399–17410, 2022.
- Deep multi-scale convolutional neural network for dynamic scene deblurring. In CVPR, 2017.
- Clean images are hard to reblur: Exploiting the ill-posed inverse task for dynamic scene deblurring. In ICLR, 2022.
- Efficient generalized cross-validation with applications to parametric image restoration and resolution enhancement. IEEE Transactions on image processing, 10(9):1299–1308, 2001.
- All-in-one image restoration for unknown degradations using adaptive discriminative filters for specific degradations. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5815–5824. IEEE, 2023.
- Promptir: Prompting for all-in-one blind image restoration. arXiv preprint arXiv:2306.13090, 2023.
- Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, pages 8748–8763. PMLR, 2021.
- Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pages 3980–3990. Association for Computational Linguistics, 2019.
- Adaptive consistency prior based deep network for image denoising. In CVPR, 2021.
- Gated fusion network for single image dehazing. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3253–3261, 2018.
- Single image dehazing via multi-scale convolutional neural networks with holistic edges. IJCV, 2020.
- High-resolution image synthesis with latent diffusion models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pages 10674–10685. IEEE, 2022.
- U-Net: convolutional networks for biomedical image segmentation. In MICCAI, 2015.
- Routing networks: Adaptive selection of non-linear functions for multi-task learning. arXiv preprint arXiv:1711.01239, 2017.
- Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22500–22510, 2023.
- Many task learning with task routing. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1375–1384, 2019.
- Image denoising using deep cnn with batch renormalization. Neural Networks, 2020.
- Anchored neighborhood regression for fast example-based super-resolution. In ICCV, 2013.
- MAXIM: Multi-axis MLP for image processing. In CVPR, pages 5769–5780, 2022.
- Transweather: Transformer-based restoration of images degraded by adverse weather conditions. In CVPR, pages 2353–2363, 2022.
- Attention is all you need. In NeurIPS, 2017.
- Underexposed photo enhancement using deep illumination estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6849–6857, 2019.
- Naturalness preserved enhancement algorithm for non-uniform illumination images. IEEE TIP, 22(9):3538–3548, 2013.
- Non-local neural networks. In CVPR, 2018a.
- ESRGAN: enhanced super-resolution generative adversarial networks. In ECCV Workshops, 2018b.
- Low-light image enhancement with illumination-aware gamma correction and complete image modelling network. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13128–13137, 2023.
- Uformer: A general u-shaped transformer for image restoration. arXiv:2106.03106, 2021.
- Deep retinex decomposition for low-light enhancement. In British Machine Vision Conference, 2018.
- Uretinex-net: Retinex-based deep unfolding network for low-light image enhancement. In CVPR, 2022.
- C-pack: Packaged resources to advance general chinese embedding. CoRR, abs/2309.07597, 2023.
- Learning to restore low-light images via decomposition-and-enhancement. In CVPR, 2020.
- Unnatural l0 sparse representation for natural image deblurring. In CVPR, 2013.
- Learning texture transformer network for image super-resolution. In CVPR, 2020.
- Band representation-based semi-supervised low-light image enhancement: bridging the gap between signal fidelity and perceptual quality. IEEE TIP, 30:3461–3473, 2021a.
- Sparse gradient regularized deep retinex network for robust low-light image enhancement. IEEE TIP, 30:2072–2086, 2021b.
- Neural degradation representation learning for all-in-one image restoration. arXiv preprint arXiv:2310.12848, 2023.
- Metaformer is actually what you need for vision. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10819–10829, 2022.
- Learning enriched features for real image restoration and enhancement. In ECCV, 2020.
- Multi-stage progressive image restoration. In CVPR, 2021.
- Restormer: Efficient transformer for high-resolution image restoration. In CVPR, 2022.
- Learning image-adaptive 3d lookup tables for high performance photo enhancement in real-time. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(4):2058–2073, 2020.
- All-in-one multi-degradation image restoration network via hierarchical degradation representation. arXiv preprint arXiv:2308.03021, 2023a.
- All-in-one multi-degradation image restoration network via hierarchical degradation representation. In Proceedings of the 31st ACM International Conference on Multimedia, pages 2285–2293, 2023b.
- Dynamic scene deblurring using spatially variant recurrent neural networks. In CVPR, 2018a.
- Ingredient-oriented multi-degradation learning for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5825–5835, 2023c.
- Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. TIP, 2017a.
- Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE transactions on image processing, 26(7):3142–3155, 2017b.
- Learning deep CNN denoiser prior for image restoration. In CVPR, 2017c.
- FFDNet: Toward a fast and flexible solution for CNN-based image denoising. TIP, 2018b.
- Deblurring by realistic blurring. In CVPR, pages 2737–2746, 2020.
- Designing a practical degradation model for deep blind image super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4791–4800, 2021.
- Kindling the darkness: A practical low-light image enhancer. In ACM MM, 2019.