Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
140 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multimodal Prompt Perceiver: Empower Adaptiveness, Generalizability and Fidelity for All-in-One Image Restoration (2312.02918v2)

Published 5 Dec 2023 in cs.CV

Abstract: Despite substantial progress, all-in-one image restoration (IR) grapples with persistent challenges in handling intricate real-world degradations. This paper introduces MPerceiver: a novel multimodal prompt learning approach that harnesses Stable Diffusion (SD) priors to enhance adaptiveness, generalizability and fidelity for all-in-one image restoration. Specifically, we develop a dual-branch module to master two types of SD prompts: textual for holistic representation and visual for multiscale detail representation. Both prompts are dynamically adjusted by degradation predictions from the CLIP image encoder, enabling adaptive responses to diverse unknown degradations. Moreover, a plug-in detail refinement module improves restoration fidelity via direct encoder-to-decoder information transformation. To assess our method, MPerceiver is trained on 9 tasks for all-in-one IR and outperforms state-of-the-art task-specific methods across most tasks. Post multitask pre-training, MPerceiver attains a generalized representation in low-level vision, exhibiting remarkable zero-shot and few-shot capabilities in unseen tasks. Extensive experiments on 16 IR tasks underscore the superiority of MPerceiver in terms of adaptiveness, generalizability and fidelity.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (133)
  1. A high-quality denoising dataset for smartphone cameras. In CVPR, pages 1692–1700, 2018.
  2. Defocus deblurring using dual-pixel data. In ECCV, pages 111–126, 2020.
  3. Ntire 2017 challenge on single image super-resolution: Dataset and study. In CVPRW, pages 126–135, 2017.
  4. Dense-haze: A benchmark for image dehazing with dense-haze and haze-free images. In ICIP, pages 1014–1018, 2019.
  5. Nh-haze: An image dehazing benchmark with non-homogeneous hazy and haze-free images. In CVPRW, pages 444–445, 2020.
  6. Self-guided image dehazing using progressive feature fusion. TIP, 31:1217–1229, 2022.
  7. Language models are few-shot learners. In NeurIPS, pages 1877–1901, 2020.
  8. Pre-trained image processing transformer. In CVPR, pages 12299–12310, 2021.
  9. Simple baselines for image restoration. In ECCV, pages 17–33, 2022a.
  10. Sparse sampling transformer with uncertainty-driven ranking for unified removal of raindrops and rain streaks. In ICCV, pages 13106–13117, 2023a.
  11. Learning multiple adverse weather removal via two-stage knowledge learning and multi-contrastive regularization: Toward a unified model. In CVPR, pages 17653–17662, 2022b.
  12. Learning a sparse transformer network for effective image deraining. In CVPR, pages 5896–5905, 2023b.
  13. Improving diffusion models for inverse problems using manifold constraints. In NeurIPS, pages 25683–25696, 2022.
  14. Diffusion posterior sampling for general noisy inverse problems. In ICLR, 2023.
  15. Focal network for image restoration. In ICCV, pages 13001–13011, 2023.
  16. Emu: Enhancing image generation models using photogenic needles in a haystack. arXiv preprint arXiv:2309.15807, 2023.
  17. Diffusion models beat gans on image synthesis. In NeurIPS, pages 8780–8794, 2021.
  18. Quantization guided jpeg artifact correction. In ECCV, pages 293–309, 2020.
  19. Generative diffusion prior for unified image restoration and enhancement. In CVPR, pages 9935–9946, 2023.
  20. Diverse data augmentation with diffusions for effective test-time prompt tuning. In ICCV, pages 2704–2714, 2023.
  21. Rich Franzen. Kodak lossless true color image suite. source: http://r0k. us/graphics/kodak, 4(2), 1999.
  22. Removing rain from single images via a deep detail network. In CVPR, pages 3855–3863, 2017.
  23. Making pre-trained language models better few-shot learners. arXiv preprint arXiv:2012.15723, 2020.
  24. Multimodal neurons in artificial neural networks. Distill, https://distill.pub/2021/multimodal-neurons/, 2021.
  25. Generative adversarial nets. In NeurIPS, page 2672–2680, 2014.
  26. Zero-reference deep curve estimation for low-light image enhancement. In CVPR, pages 1780–1789, 2020a.
  27. Image dehazing transformer with transmission-aware 3d position embedding. In CVPR, pages 5812–5820, 2022.
  28. Residual learning for effective joint demosaicing-denoising. arXiv preprint arXiv:2009.06205, 2020b.
  29. From sky to the ground: A large-scale benchmark and simple baseline towards real rain removal. In ICCV, pages 12097–12107, 2023.
  30. Fhde22{}^{\mbox{2}}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPTnet: Full high definition demoireing network. In ECCV, pages 713–729, 2020.
  31. Natural language descriptions of deep visual features. In ICLR, 2022.
  32. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NeurIPS, pages 6626–6637, 2017.
  33. Denoising diffusion probabilistic models. In NeurIPS, pages 6840–6851, 2020.
  34. Memory oriented transfer learning for semi-supervised image deraining. In CVPR, pages 7732–7741, 2021.
  35. Memory uncertainty learning for real-world single image deraining. TPAMI, 45(3):3446–3460, 2022.
  36. Single image super-resolution from transformed self-exemplars. In CVPR, pages 5197–5206, 2015.
  37. Diversity-aware meta visual prompting. In CVPR, pages 10878–10887, 2023a.
  38. Contrastive semi-supervised learning for underwater image restoration via reliable bank. In CVPR, pages 18145–18155, 2023b.
  39. Arbitrary style transfer in real-time with adaptive instance normalization. In ICCV, pages 1501–1510, 2017.
  40. Visual prompt tuning. In ECCV, pages 709–727, 2022.
  41. Towards flexible blind jpeg artifacts removal. In ICCV, pages 4997–5006, 2021.
  42. Perceptual losses for real-time style transfer and super-resolution. In ECCV, pages 694–711, 2016.
  43. Denoising diffusion restoration models. In NeurIPS, pages 23593–23606, 2022.
  44. Aod-net: All-in-one dehazing network. In ICCV, pages 4770–4778, 2017.
  45. Benchmarking single-image dehazing and beyond. TIP, 28(1):492–505, 2018.
  46. All-in-one image restoration for unknown corruption. In CVPR, pages 17452–17462, 2022a.
  47. An underwater image enhancement benchmark dataset and beyond. TIP, 29:4376–4389, 2019a.
  48. Underwater scene prior inspired deep underwater image and video enhancement. PR, 98:107038, 2020a.
  49. Srdiff: Single image super-resolution with diffusion probabilistic models. Neurocomputing, 479:47–59, 2022b.
  50. Heavy rain image restoration: Integrating physics model and conditional adversarial learning. In CVPR, pages 1633–1642, 2019b.
  51. All in one bad weather removal using architectural search. In CVPR, pages 3175–3185, 2020b.
  52. Diffusion models for image restoration and enhancement–a comprehensive survey. arXiv preprint arXiv:2308.09388, 2023a.
  53. Prefix-tuning: Optimizing continuous prompts for generation. In ACL, pages 4582–4597, 2021.
  54. Efficient and explicit modelling of image hierarchies for image restoration. In CVPR, pages 18278–18289, 2023b.
  55. Swinir: Image restoration using swin transformer. In ICCVW, pages 1833–1844, 2021.
  56. Drt: A lightweight single image deraining recursive transformer. In CVPRW, pages 589–598, 2022.
  57. Focal loss for dense object detection. In ICCV, pages 2980–2988, 2017.
  58. Diffbir: Towards blind image restoration with generative diffusion prior. arXiv preprint arXiv:2308.15070, 2023.
  59. Wavelet-based dual-branch network for image demoiréing. In ECCV, pages 86–102, 2020.
  60. Tape: Task-agnostic prior embedding for image restoration. In ECCV, pages 447–464, 2022.
  61. Explicit visual prompting for low-level structure segmentations. In CVPR, pages 19434–19445, 2023.
  62. P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. arXiv preprint arXiv:2110.07602, 2021.
  63. Desnownet: Context-aware deep network for snow removal. TIP, 27(6):3064–3073, 2018.
  64. Sgdr: Stochastic gradient descent with warm restarts. In ICLR, 2017.
  65. Waterloo exploration database: New challenges for image quality assessment models. TIP, 26(2):1004–1016, 2016.
  66. What can help pedestrian detection? In CVPR, pages 3127–3136, 2017.
  67. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In ICCV, pages 416–423, 2001.
  68. Disentangling visual and written concepts in clip. In CVPR, pages 16410–16419, 2022.
  69. Ilvr: Conditioning method for denoising diffusion probabilistic models. Palette: Image-to-image diffusion models. In ICCV, pages 14347–14356, 2021.
  70. Deep generalized unfolding networks for image restoration. In CVPR, pages 17399–17410, 2022.
  71. Deep multi-scale convolutional neural network for dynamic scene deblurring. In CVPR, pages 3883–3891, 2017.
  72. Restoring vision in adverse weather conditions with patch-based denoising diffusion models. TPAMI, 45(8):10346–10357, 2023.
  73. All-in-one image restoration for unknown degradations using adaptive discriminative filters for specific degradations. In CVPR, pages 5815–5824, 2023.
  74. Attentive generative adversarial network for raindrop removal from a single image. In CVPR, pages 2482–2491, 2018.
  75. Mb-taylorformer: Multi-branch efficient transformer expanded by taylor formula for image dehazing. In ICCV, pages 12802–12813, 2023.
  76. Removing raindrops and rain streaks in one go. In CVPR, pages 9147–9156, 2021.
  77. Deep learning for seeing through window with raindrops. In ICCV, pages 2463–2471, 2019.
  78. Neumann network with recursive kernels for single image defocus deblurring. In CVPR, pages 5754–5763, 2023.
  79. Learning transferable visual models from natural language supervision. In ICML, pages 8748–8763, 2021.
  80. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
  81. Denseclip: Language-guided dense prediction with context-aware prompting. In CVPR, pages 18082–18091, 2022.
  82. Real-world blur dataset for learning and benchmarking deblurring algorithms. In ECCV, pages 184–201, 2020.
  83. High-resolution image synthesis with latent diffusion models. In CVPR, pages 10684–10695, 2022.
  84. Learning to deblur using light field generated and real defocus images. In CVPR, pages 16304–16313, 2022.
  85. Palette: Image-to-image diffusion models. In SIGGRAPH, pages 1–10, 2022a.
  86. Photorealistic text-to-image diffusion models with deep language understanding. In NeurIPS, pages 36479–36494, 2022b.
  87. Image super-resolution via iterative refinement. TPAMI, 45(4):4713–4726, 2022c.
  88. HR Sheikh. Live image quality assessment database release 2. http://live. ece. utexas. edu/research/quality, 2005.
  89. Denoising diffusion implicit models. In ICLR, 2021a.
  90. Pseudoinverse-guided diffusion models for inverse problems. In ICLR, 2023.
  91. Score-based generative modeling through stochastic differential equations. In ICLR, 2021b.
  92. Moiré photo restoration using multiresolution convolutional neural networks. TIP, 27(8):4160–4172, 2018.
  93. Stripformer: Strip transformer for fast image deblurring. In ECCV, pages 146–162, 2022.
  94. Transweather: Transformer-based restoration of images degraded by adverse weather conditions. In CVPR, pages 2353–2363, 2022.
  95. Attention is all you need. In NeurIPS, pages 5998–6008, 2017.
  96. Exploiting diffusion prior for real-world image super-resolution. arXiv preprint arXiv:2305.07015, 2023a.
  97. Underexposed photo enhancement using deep illumination estimation. In CVPR, pages 6849–6857, 2019.
  98. Low-light image enhancement with normalizing flow. In AAAI, pages 2604–2612, 2022a.
  99. Zero-shot image restoration using denoising diffusion null-space model. In ICLR, 2023b.
  100. Uformer: A general u-shaped transformer for image restoration. In CVPR, pages 17683–17693, 2022b.
  101. Learning to prompt for continual learning. In CVPR, pages 139–149, 2022c.
  102. Lg-bpn: Local and global blind-patch network for self-supervised real-world denoising. In CVPR, pages 18156–18165, 2023c.
  103. Deblurring via stochastic refinement. In CVPR, pages 16293–16303, 2022.
  104. Contrastive learning for compact single image dehazing. In CVPR, pages 10551–10560, 2021.
  105. Learning semantic-aware knowledge guidance for low-light image enhancement. In CVPR, pages 1662–1671, 2023.
  106. Diffir: Efficient diffusion model for image restoration. In ICCV, pages 13095–13105, 2023.
  107. Image de-raining transformer. TPAMI, 45(11):12978–12995, 2023.
  108. Snr-aware low-light image enhancement. In CVPR, pages 17714–17724, 2022.
  109. Low-light image enhancement via structure modeling and guidance. In CVPR, pages 9893–9903, 2023.
  110. Sparse gradient regularized deep retinex network for robust low-light image enhancement. TIP, 30:2072–2086, 2021.
  111. Towards efficient and scale-robust ultra-high-definition image demoiréing. In ECCV, pages 646–662, 2022.
  112. Multi-stage progressive image restoration. In CVPR, pages 14821–14831, 2021.
  113. Restormer: Efficient transformer for high-resolution image restoration. In CVPR, pages 5728–5739, 2022a.
  114. Learning enriched features for fast image restoration and enhancement. TPAMI, 45(2):1934–1948, 2022b.
  115. Ingredient-oriented multi-degradation learning for image restoration. In CVPR, pages 5825–5835, 2023a.
  116. Accurate image restoration with attention retractable transformer. In ICLR, 2023b.
  117. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. TIP, 26(7):3142–3155, 2017.
  118. Ffdnet: Toward a fast and flexible solution for cnn-based image denoising. TIP, 27(9):4608–4622, 2018a.
  119. Deep dense multi-scale network for snow removal using semantic and depth priors. TIP, 30:7419–7431, 2021a.
  120. Plug-and-play image restoration with deep denoiser prior. TPAMI, 44(10):6360–6376, 2021b.
  121. Color demosaicking by local directional interpolation and nonlocal adaptive thresholding. JEI, 20(2):023016, 2011.
  122. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, pages 586–595, 2018b.
  123. Residual non-local attention networks for image restoration. In ICLR, 2019.
  124. Comprehensive and delicate: An efficient transformer for image restoration. In CVPR, pages 14122–14132, 2023.
  125. Image demoireing with learnable bandpass filters. In CVPR, pages 3636–3645, 2020.
  126. Conditional prompt learning for vision-language models. In CVPR, pages 16816–16825, 2022a.
  127. Learning to prompt for vision-language models. IJCV, 130(9):2337–2348, 2022b.
  128. Msra-sr: Image super-resolution transformer with multi-scale shared representation acquisition. In ICCV, pages 12665–12676, 2023.
  129. Image restoration for under-display camera. In CVPR, pages 9179–9188, 2021.
  130. Learning weather-general and weather-specific features for image restoration under multiple adverse weather conditions. In CVPR, pages 21747–21758, 2023a.
  131. Denoising diffusion models for plug-and-play image restoration. In CVPRW, pages 1219–1229, 2023b.
  132. Traffic-sign detection and classification in the wild. In CVPR, pages 2110–2118, 2016.
  133. Designing a better asymmetric vqgan for stablediffusion. arXiv preprint arXiv:2306.04632, 2023c.
Citations (17)

Summary

We haven't generated a summary for this paper yet.