Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

125 tokens/sec

GPT-4o

53 tokens/sec

Gemini 2.5 Pro Pro

42 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

47 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

443 1

InstructIR: High-Quality Image Restoration Following Human Instructions (2401.16468v5)

Published 29 Jan 2024 in cs.CV, cs.LG, and eess.IV

Abstract: Image restoration is a fundamental problem that involves recovering a high-quality clean image from its degraded observation. All-In-One image restoration models can effectively restore images from various types and levels of degradation using degradation-specific information as prompts to guide the restoration model. In this work, we present the first approach that uses human-written instructions to guide the image restoration model. Given natural language prompts, our model can recover high-quality images from their degraded counterparts, considering multiple degradation types. Our method, InstructIR, achieves state-of-the-art results on several restoration tasks including image denoising, deraining, deblurring, dehazing, and (low-light) image enhancement. InstructIR improves +1dB over previous all-in-one restoration methods. Moreover, our dataset and results represent a novel benchmark for new research on text-guided image restoration and enhancement. Our code, datasets and models are available at: https://github.com/mv-lab/InstructIR

References (107)

Citations (25)

View on Semantic Scholar

Summary

The paper introduces InstructIR, a novel system that uses human-written instructions to guide versatile image restoration tasks.
It employs a language-informed algorithm with a sentence transformer and over 10,000 GPT-4 generated prompts to address degradations like denoising and deblurring.
Empirical results show a +1dB improvement over previous methods, demonstrating significant performance gains and enhanced user-guided restoration.

Overview

This paper introduces InstructIR, an innovative approach to image restoration that leverages human-written instructions as a guiding mechanism. Unlike traditional models that either address specific types of degradations or handle various degradations through pre-defined guidance vectors, InstructIR takes advantage of natural language processing to understand and execute restoration tasks described in natural language instructions. Through a robust set of experiments, the researchers validate the efficacy of employing text guidance for image restoration, with InstructIR setting new benchmarks across multiple restoration tasks.

Methodology

The work explores the intersection of image restoration and instruction-based guidance. The authors propose a language-informed algorithm that can interpret human-written instructions to perform complex restoration tasks on degraded images. At the core of InstructIR is the usage of a text encoder—such as a sentence transformer—that captures the semantics of user prompts and translates them into an embedding space that the image restoration model can understand.

The research makes a significant contribution by demonstrating that a single InstructIR model, powered by NAFNet's efficient architecture, can simultaneously address various restoration tasks, such as denoising, deraining, deblurring, dehazing, and low-light enhancement. The underlying method treats instruction-based image restoration as a supervised learning problem, where over 10,000 diverse prompts are first generated using GPT-4 and paired with corresponding degraded images to form a robust training dataset.

Results

Empirical results indicate that InstructIR surpasses state-of-the-art benchmarks on different image restoration tasks. An improvement of +1dB over previous all-in-one restoration methods is reported, demonstrating the model's ability to process complex, multi-degradation problems effectively. InstructIR's flexibility is also showcased as it caters to the restoration needs prescribed explicitly by end-users through arbitrary instructions.

Implications and Conclusion

The significance of InstructIR lies not only in its performance but also in the paradigm shift it introduces in user interaction with restoration models. The model interprets a vast range of instructions, offering an intuitive interface for non-experts to achieve desired restoration outcomes. By releasing the dataset and articulating a new benchmark for text-guided image restoration, this research paves the way for subsequent exploration and development in the area.

In conclusion, the paper describes a critical advance in leveraging human guidance via natural language prompts to facilitate the challenging task of image restoration. By demonstrating remarkable performance across several benchmark tasks, InstructIR exemplifies the promising synthesis of language understanding and visual data processing, heralding a future where AI-driven image restoration becomes more accessible and user-friendly.

PDF Markdown

GitHub

GitHub - mv-lab/InstructIR: InstructIR: High-Quality Image Restoration Following Human Instructions https://huggingface.co/spaces/marcosv/InstructIR (411 stars)

Tweets

https://twitter.com/camenduru/status/1752755523034767596

https://twitter.com/_akhaliq/status/1752551364566126798

https://twitter.com/WilliamLamkin/status/1752691055772709084

https://twitter.com/taziku_co/status/1752823580944158738

https://twitter.com/fly51fly/status/1752823527827525958

https://twitter.com/kashifcreations/status/1752599338646393026