Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks (2403.00644v4)
Abstract: Diffusion models trained on large-scale datasets have achieved remarkable progress in image synthesis. However, due to the randomness in the diffusion process, they often struggle with handling diverse low-level tasks that require details preservation. To overcome this limitation, we present a new Diff-Plugin framework to enable a single pre-trained diffusion model to generate high-fidelity results across a variety of low-level tasks. Specifically, we first propose a lightweight Task-Plugin module with a dual branch design to provide task-specific priors, guiding the diffusion process in preserving image content. We then propose a Plugin-Selector that can automatically select different Task-Plugins based on the text instruction, allowing users to edit images by indicating multiple low-level tasks with natural language. We conduct extensive experiments on 8 low-level vision tasks. The results demonstrate the superiority of Diff-Plugin over existing methods, particularly in real-world scenarios. Our ablations further validate that Diff-Plugin is stable, schedulable, and supports robust training across different dataset sizes.
- Spatext: Spatio-textual representation for controllable image generation. In CVPR, pages 18370–18380, 2023.
- ediffi: Text-to-image diffusion models with an ensemble of expert denoisers. arXiv, 2022.
- Demystifying mmd gans. In ICLR, 2018.
- Instructpix2pix: Learning to follow image editing instructions. In CVPR, pages 18392–18402, 2023.
- A simple framework for contrastive learning of visual representations. In ICML, pages 1597–1607, 2020.
- Multi-label image recognition with graph convolutional networks. In CVPR, pages 5177–5186, 2019.
- Diffusion posterior sampling for general noisy inverse problems. In ICLR, 2023.
- Zero-shot spatial layout conditioning for text-to-image diffusion models. In ICCV, pages 2174–2183, 2023.
- Diffusion models beat gans on image synthesis. In NeurIPS, pages 8780–8794, 2021.
- Prompt tuning inversion for text-driven image editing using diffusion models. In ICCV, pages 7430–7440, 2023.
- Taming transformers for high-resolution image synthesis. In CVPR, pages 12873–12883, 2021.
- Generative diffusion prior for unified image restoration and enhancement. In CVPR, pages 9935–9946, 2023.
- A multi-task network for joint specular highlight detection and removal. In CVPR, pages 7752–7761, 2021.
- An image is worth one word: Personalizing text-to-image generation using textual inversion. In ICLR, 2023.
- Vqfr: Blind face restoration with vector-quantized dictionary and parallel decoder. In ECCV, pages 126–143, 2022.
- Shadowdiffusion: When degradation prior meets diffusion model for shadow removal. In CVPR, pages 14049–14058, 2023.
- Lime: Low-light image enhancement via illumination map estimation. IEEE TIP, 26(2):982–993, 2016.
- Mask r-cnn. In ICCV, pages 2961–2969, 2017.
- Prompt-to-prompt image editing with cross attention control. In ICLR, 2022.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NeurIPS, 2017.
- Classifier-free diffusion guidance. arXiv, 2022.
- Denoising diffusion probabilistic models. In NeurIPS, pages 6840–6851, 2020.
- Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. In Technical report, University of Massachusetts, Amherst, 2007.
- Low-light image enhancement with wavelet-based diffusion models. TOG, 42(6):1–14, 2023.
- Diffusion models for zero-shot open-vocabulary segmentation. arXiv, 2023.
- A style-based generator architecture for generative adversarial networks. In CVPR, pages 4401–4410, 2019.
- Denoising diffusion restoration models. In NeurIPS, 2022.
- Imagic: Text-based real image editing with diffusion models. In CVPR, pages 6007–6017, 2023.
- Multi-concept customization of text-to-image diffusion. In CVPR, pages 1931–1941, 2023.
- Contrast enhancement based on layered difference representation of 2d histograms. IEEE TIP, 22(12):5372–5384, 2013.
- Your diffusion model is secretly a zero-shot classifier. In ICCV, pages 2206–2217, 2023.
- Benchmarking single-image dehazing and beyond. IEEE TIP, 28(1):492–505, 2018.
- All-in-one image restoration for unknown corruption. In CVPR, pages 17452–17462, 2022.
- Diffbir: Towards blind image restoration with generative diffusion prior. arXiv, 2023.
- Visual instruction tuning. In NeurIPS, 2023.
- Desnownet: Context-aware deep network for snow removal. IEEE TIP, 27(6):3064–3073, 2018.
- Decoupled weight decay regularization. arXiv, 2017.
- Perceptual quality assessment for multi-exposure image fusion. IEEE TIP, 24(11):3345–3356, 2015.
- Null-text inversion for editing real images using guided diffusion models. In CVPR, pages 6038–6047, 2023.
- T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. arXiv, 2023.
- Deep multi-scale convolutional neural network for dynamic scene deblurring. In CVPR, pages 3883–3891, 2017.
- Glide: Towards photorealistic image generation and editing with text-guided diffusion models. PMLR, 2021.
- OpenAI. Chatgpt plugins: https://openai.com/blog/chatgpt-plugins. 2023a.
- OpenAI. Gpt-4 technical report. arXiv, 2023b.
- Restoring vision in adverse weather conditions with patch-based denoising diffusion models. IEEE TPAMI, 2023.
- Zero-shot image-to-image translation. In SIGGRAPH, pages 1–11, 2023.
- Promptir: Prompting for all-in-one blind image restoration. In NeurIPS, 2023.
- Unicontrol: A unified diffusion model for controllable visual generation in the wild. In NeurIPS, 2023.
- Learning transferable visual models from natural language supervision. In ICML, pages 8748–8763, 2021.
- Exploring the limits of transfer learning with a unified text-to-text transformer. JMLR, pages 5485–5551, 2020.
- Hierarchical text-conditional image generation with clip latents. arXiv, 2022.
- Multiscale structure guided diffusion for image deblurring. In ICCV, pages 10721–10733, 2023.
- Real-world blur dataset for learning and benchmarking deblurring algorithms. In ECCV, pages 184–201, 2020.
- High-resolution image synthesis with latent diffusion models. In CVPR, pages 10684–10695, 2022.
- Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In CVPR, pages 22500–22510, 2023.
- Palette: Image-to-image diffusion models. In SIGGRAPH, pages 1–10, 2022a.
- Photorealistic text-to-image diffusion models with deep language understanding. In NeurIPS, pages 36479–36494, 2022b.
- Image super-resolution via iterative refinement. IEEE TPAMI, 45(4):4713–4726, 2022c.
- Laion-5b: An open large-scale dataset for training next generation image-text models. In NeurIPS, pages 25278–25294, 2022.
- Deep unsupervised learning using nonequilibrium thermodynamics. In ICML, pages 2256–2265, 2015.
- Denoising diffusion implicit models. In ICLR, 2021.
- Generative modeling by estimating gradients of the data distribution. In NeurIPS, 2019.
- Plug-and-play diffusion features for text-driven image-to-image translation. In CVPR, pages 1921–1930, 2023.
- On the evaluation of illumination compensation algorithms. MTA, 77:9211–9231, 2018.
- Edict: Exact diffusion inversion via coupled transformations. In CVPR, pages 22532–22541, 2023.
- Exploiting diffusion prior for real-world image super-resolution. arXiv, 2023a.
- Naturalness preserved enhancement algorithm for non-uniform illumination images. IEEE TIP, 22(9):3538–3548, 2013.
- Spatial attentive single-image deraining with a high quality real rain dataset. In CVPR, pages 12270–12279, 2019.
- Towards real-world blind face restoration with generative facial prior. In CVPR, pages 9168–9178, 2021.
- Zero-shot image restoration using denoising diffusion null-space model. In ICLR, 2022.
- Dr2: Diffusion-based robust degradation remover for blind face restoration. In CVPR, pages 1704–1713, 2023b.
- Deep retinex decomposition for low-light enhancement. In BMVC, 2018.
- Deblurring via stochastic refinement. In CVPR, pages 16293–16303, 2022.
- Diffir: Efficient diffusion model for image restoration. In ICCV, pages 13095–13105, 2023.
- Plug-and-play document modules for pre-trained models. In ACL, 2023.
- Boxdiff: Text-to-image synthesis with training-free box-constrained diffusion. In ICCV, pages 7452–7461, 2023.
- Small models are valuable plug-ins for large language models. arXiv, 2023a.
- Prompt-free diffusion: Taking” text” out of text-to-image diffusion models. arXiv, 2023b.
- Implicit neural representation for cooperative low-light image enhancement. In ICCV, pages 12918–12927, 2023.
- Deep joint rain detection and removal from a single image. In CVPR, pages 1357–1366, 2017.
- Perceiving and modeling density for image dehazing. In ECCV, pages 130–145, 2022.
- Adverse weather removal with codebook priors. In ICCV, pages 12653–12664, 2023.
- Diff-retinex: Rethinking low-light image enhancement with a generative diffusion model. In CVPR, pages 12302–12311, 2023.
- Towards efficient and scale-robust ultra-high-definition image demoiréing. In ECCV, pages 646–662, 2022.
- Aim 2019 challenge on image demoireing: Methods and results. In ICCVW, pages 3534–3545, 2019.
- Multi-stage progressive image restoration. In CVPR, pages 14821–14831, 2021.
- Restormer: Efficient transformer for high-resolution image restoration. In CVPR, pages 5728–5739, 2022.
- Deep dense multi-scale network for snow removal using semantic and depth priors. IEEE TIP, 30:7419–7431, 2021.
- Magicbrush: A manually annotated dataset for instruction-guided image editing. In NeurIPS, 2023a.
- Adding conditional control to text-to-image diffusion models. In ICCV, pages 3836–3847, 2023b.
- Inversion-based style transfer with diffusion models. In CVPR, pages 10146–10156, 2023c.
- A unified conditional framework for diffusion-based image restoration. In NeurIPS, 2023d.
- Sine: Single image editing with text-to-image diffusion models. In CVPR, pages 6027–6037, 2023e.
- Uni-controlnet: All-in-one control to text-to-image diffusion models. In NeurIPS, 2023a.
- Towards authentic face restoration with iterative diffusion models and beyond. In ICCV, pages 7312–7322, 2023b.
- Generative prompt model for weakly supervised object localization. In ICCV, pages 6351–6361, 2023c.
- Minigpt-4: Enhancing vision-language understanding with advanced large language models. In ICLR, 2024.
- Learning weather-general and weather-specific features for image restoration under multiple adverse weather conditions. In CVPR, pages 21747–21758, 2023.