LLMRA: Multi-modal Large Language Model based Restoration Assistant (2401.11401v1)
Abstract: Multi-modal LLMs (MLLMs) have a significant impact on various tasks, due to their extensive knowledge and powerful perception and generation capabilities. However, it still remains an open research problem on applying MLLMs to low-level vision tasks. In this paper, we present a simple MLLM-based Image Restoration framework to address this gap, namely Multi-modal LLM based Restoration Assistant (LLMRA). We exploit the impressive capabilities of MLLMs to obtain the degradation information for universal image restoration. By employing a pretrained multi-modal LLM and a vision LLM, we generate text descriptions and encode them as context embedding with degradation information for the degraded image. Through the proposed Context Enhance Module (CEM) and Degradation Context based Transformer Network (DC-former), we integrate these context embedding into the restoration network, contributing to more accurate and adjustable image restoration. Based on the dialogue with the users, our method leverages image degradation priors from MLLMs, providing low-level attributes descriptions of the input low-quality images and the restored high-quality images simultaneously. Extensive experiments demonstrate the superior performance of our LLMRA in universal image restoration tasks.
- Clip2stylegan: Unsupervised extraction of stylegan edit directions. In SIGGRAPH, pages 1–9, 2022.
- Flamingo: a visual language model for few-shot learning. NeurIPS, 2022.
- VQA: Visual Question Answering. In ICCV, 2015.
- Instructpix2pix: Learning to follow image editing instructions. In CVPR, 2023.
- Microsoft coco captions: Data collection and evaluation server. arXiv preprint arXiv:1504.00325, 2015.
- Vqgan-clip: Open domain image generation and editing with natural language guidance. In ECCV, 2022.
- Attentional feature fusion. In WACV, 2021.
- Instructblip: Towards general-purpose vision-language models with instruction tuning. arXiv preprint arXiv:2305.06500, 2023.
- Instructdiffusion: A generalist modeling interface for vision tasks. arXiv preprint arXiv:2309.03895, 2023.
- Single image super-resolution from transformed self-exemplars. In CVPR, 2015.
- Huggingface. Introducing idefics: An open reproduction of state-of-the-art visual language model, 2023.
- Multi-scale progressive fusion network for single image deraining. In CVPR, 2020.
- Enlightengan: Deep light enhancement without paired supervision. TIP, 2021.
- Lisa: Reasoning segmentation via large language model. arXiv preprint arXiv:2308.00692, 2023.
- All in one bad weather removal using architectural search. In CVPR, 2020.
- All-in-one image restoration for unknown corruption. In CVPR, 2022.
- Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597, 2023.
- Visual instruction tuning. NeurIPS, 2023.
- Controlling vision-language models for universal image restoration. arXiv preprint arXiv:2310.01018, 2023.
- Waterloo exploration database: New challenges for image quality assessment models. TIP, 2016.
- A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In ICCV, 2001.
- Promptir: Prompting for all-in-one blind image restoration. arXiv preprint arXiv:2306.13090, 2023.
- Learning transferable visual models from natural language supervision. In ICML, 2021.
- High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
- Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca, 2023.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Transweather: Transformer-based restoration of images degraded by adverse weather conditions. In CVPR, 2022.
- Attention is all you need. NeurIPS, 2017.
- Uformer: A general u-shaped transformer for image restoration. In CVPR, 2022.
- Deep retinex decomposition for low-light enhancement. arXiv preprint arXiv:1808.04560, 2018.
- Semi-supervised transfer learning for image rain removal. In CVPR, 2019.
- Uretinex-net: Retinex-based deep unfolding network for low-light image enhancement. In CVPR, 2022.
- Q-bench: A benchmark for general-purpose foundation models on low-level vision. arXiv preprint arXiv:2309.14181, 2023.
- Llmga: Multimodal large language model based generation assistant. arXiv preprint arXiv:2311.16500, 2023.
- Diffir: Efficient diffusion model for image restoration. arXiv preprint arXiv:2303.09472, 2023.
- Joint rain detection and removal from a single image with contextualized deep networks. TPAMI, 2019.
- Uncertainty guided multi-scale residual learning-using a cycle spinning cnn for single image de-raining. In CVPR, 2019.
- Restormer: Efficient transformer for high-resolution image restoration. In CVPR, 2022.
- Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. TIP, 2017.
- Learning deep CNN denoiser prior for image restoration. In CVPR, 2017.
- Ffdnet: Toward a fast and flexible solution for cnn-based image denoising. TIP, 2018.
- Kindling the darkness: A practical low-light image enhancer. In ACM MM, 2019.
- Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592, 2023.
- Xiaoyu Jin (6 papers)
- Yuan Shi (42 papers)
- Bin Xia (56 papers)
- Wenming Yang (71 papers)