Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
11 tokens/sec
Gemini 2.5 Pro Pro
52 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
10 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

An Intelligent Agentic System for Complex Image Restoration Problems (2410.17809v2)

Published 23 Oct 2024 in cs.CV

Abstract: Real-world image restoration (IR) is inherently complex and often requires combining multiple specialized models to address diverse degradations. Inspired by human problem-solving, we propose AgenticIR, an agentic system that mimics the human approach to image processing by following five key stages: Perception, Scheduling, Execution, Reflection, and Rescheduling. AgenticIR leverages LLMs and vision-LLMs (VLMs) that interact via text generation to dynamically operate a toolbox of IR models. We fine-tune VLMs for image quality analysis and employ LLMs for reasoning, guiding the system step by step. To compensate for LLMs' lack of specific IR knowledge and experience, we introduce a self-exploration method, allowing the LLM to observe and summarize restoration results into referenceable documents. Experiments demonstrate AgenticIR's potential in handling complex IR tasks, representing a promising path toward achieving general intelligence in visual processing.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (94)
  1. NTIRE 2017 challenge on single image super-resolution: Dataset and study. In CVPRW, 2017.
  2. Large language models for mathematical reasoning: Progresses and challenges, 2024. URL https://arxiv.org/abs/2402.00157.
  3. Graph of Thoughts: Solving Elaborate Problems with Large Language Models. AAAI, 2024.
  4. Masked image training for generalizable deep image denoising. In CVPR, June 2023a.
  5. RestoreAgent: Autonomous image restoration agent via multimodal large language models, 2024a. URL https://arxiv.org/abs/2407.18035.
  6. Low-res leads the way: Improving generalization for super-resolution by self-supervised learning. In CVPR, pp.  25857–25867, 2024b.
  7. Activating more pixels in image super-resolution transformer. In CVPR, pp.  22367–22377, June 2023b.
  8. A comparative study of image restoration networks for general backbone network design. In ECCV, 2024c.
  9. Learning a low-level vision generalist via visual task prompt. In ACM MM, 2024d.
  10. Hierarchical integration diffusion model for realistic image deblurring. In NeurIPS, 2023c.
  11. InstructIR: High-quality image restoration following human instructions. In ECCV, 2024.
  12. Chatlaw: A multi-agent collaborative legal assistant with knowledge graph enhanced mixture-of-experts large language model, 2024. URL https://arxiv.org/abs/2306.16092.
  13. Image super-resolution using deep convolutional networks. IEEE TPAMI, 2016.
  14. PaLM-E: An embodied multimodal language model, 2023. URL https://arxiv.org/abs/2303.03378.
  15. Is it an agent, or just a program?: A taxonomy for autonomous agents. In Intelligent Agents III Agent Theories, Architectures, and Languages, 1997.
  16. Clearing the skies: A deep network architecture for single-image rain removal. IEEE TIP, 26(6):2944–2956, 2017. doi: 10.1109/TIP.2017.2691802.
  17. Generative adversarial nets. In NeurIPS, 2014.
  18. Blind super-resolution with iterative kernel correction. In CVPR, pp.  1604–1613, 2019.
  19. Networks are slacking off: Understanding generalization problem in image deraining. In NeurIPS, 2023.
  20. Single image haze removal using dark channel prior. In CVPR, 2009.
  21. LoRA: Low-rank adaptation of large language models. In ICLR, 2022.
  22. Learning agile and dynamic motor skills for legged robots. Science Robotics, 2019.
  23. Towards flexible blind JPEG artifacts removal. In ICCV, October 2021.
  24. A survey on large language models for code generation, 2024a. URL https://arxiv.org/abs/2406.00515.
  25. AutoDIR: Automatic all-in-one image restoration with latent diffusion, 2024b. URL https://arxiv.org/abs/2310.10123.
  26. Daniel Kahneman. Thinking, fast and slow. Farrar, Straus and Giroux, 2011.
  27. Position: LLMs can’t plan, but can help planning in LLM-modulo frameworks. In ICML, 2024.
  28. MUSIQ: Multi-scale image quality transformer. In ICCV, October 2021.
  29. Reflash dropout in image super-resolution. In CVPR, pp.  6002–6012, 2022.
  30. Towards effective multiple-in-one image restoration: A sequential and prompt learning strategy, 2024a.
  31. A preliminary exploration towards general image restoration, 2024b. URL https://arxiv.org/abs/2408.15143.
  32. Photo-realistic single image super-resolution using a generative adversarial network. In CVPR, July 2017.
  33. Iterative filter adaptive network for single image defocus deblurring. In CVPR, pp.  2034–2042, June 2021.
  34. Benchmarking single-image dehazing and beyond. IEEE TIP, 2019.
  35. All-in-one image restoration for unknown corruption. In CVPR, June 2022.
  36. Multimodal foundation models: From specialists to general-purpose assistants. Foundations and Trends in Computer Graphics and Vision, May 2024.
  37. SwinIR: Image restoration using swin transformer. In ICCVW, October 2021.
  38. DiffBIR: Towards blind image restoration with generative diffusion prior. In ECCV, 2024.
  39. Visual instruction tuning. In NeurIPS, 2023.
  40. Unifying image processing as visual prompting question answering. In ICML, 2024.
  41. Controlling vision-language models for multi-task image restoration. In ICLR, 2024.
  42. Benchmarking robustness in object detection: Autonomous driving when winter is coming, 2020. URL https://arxiv.org/abs/1907.07484.
  43. Deep multi-scale convolutional neural network for dynamic scene deblurring. In CVPR, July 2017.
  44. OpenAI. GPT-4 technical report, 2023a.
  45. OpenAI. GPT-4V(ision) system card, 2023b. URL https://openai.com/index/gpt-4v-system-card.
  46. Generative agents: Interactive simulacra of human behavior. In Annual ACM Symposium on User Interface Software and Technology, 2023.
  47. PromptIR: Prompting for all-in-one image restoration. In NeurIPS, 2023.
  48. Learning to deblur using light field generated and real defocus images. In CVPR, June 2022.
  49. HuggingGPT: Solving AI tasks with ChatGPT and its friends in Hugging Face. In NeurIPS, 2023.
  50. Reflexion: language agents with verbal reinforcement learning. In NeurIPS, 2023.
  51. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 2018.
  52. Vision transformers for single image dehazing. IEEE TIP, 2023.
  53. Cognitive architectures for language agents. Transactions on Machine Learning Research, 2024.
  54. NTIRE 2017 challenge on single image super-resolution: Methods and results. In CVPRW, July 2017.
  55. LLaMA: Open and efficient foundation language models, 2023. URL https://arxiv.org/abs/2302.13971.
  56. MAXIM: Multi-axis mlp for image processing. In CVPR, June 2022.
  57. MAXIM: Multi-axis mlp for image processing. In CVPR, June 2022.
  58. TransWeather: Transformer-based restoration of images degraded by adverse weather conditions. In CVPR, June 2022.
  59. TransWeather: Transformer-based restoration of images degraded by adverse weather conditions. In CVPR, June 2022.
  60. Exploring CLIP for assessing the look and feel of images. AAAI, June 2023.
  61. Exploring CLIP for assessing the look and feel of images. AAAI, June 2023.
  62. A survey on large language model based autonomous agents. Frontiers of Computer Science, March 2024.
  63. A survey on large language model based autonomous agents. Frontiers of Computer Science, March 2024.
  64. Real-ESRGAN: Training real-world blind super-resolution with pure synthetic data. In ICCVW, October 2021.
  65. Real-ESRGAN: Training real-world blind super-resolution with pure synthetic data. In ICCVW, October 2021.
  66. Image quality assessment: from error visibility to structural similarity. IEEE TIP, 2004.
  67. Deep retinex decomposition for low-light enhancement. In British Machine Vision Conference, 2018.
  68. Chain-of-thought prompting elicits reasoning in large language models. In NeurIPS, 2022.
  69. Visual ChatGPT: Talking, drawing and editing with visual foundation models, 2023a.
  70. Q-Bench: A benchmark for general-purpose foundation models on low-level vision. In ICLR, 2024a.
  71. Q-Instruct: Improving low-level visual abilities for multi-modality foundation models. In CVPR, June 2024b.
  72. Q-Align: Teaching LMMs for visual scoring via discrete text-defined levels. In ICML. PMLR, 2024c.
  73. Towards open-ended visual quality comparison. In ECCV, 2024d.
  74. RIDCP: Revitalizing real image dehazing via high-quality codebook priors. In CVPR, June 2023b.
  75. The rise and potential of large language model based agents: A survey, 2023. URL https://arxiv.org/abs/2309.07864.
  76. MANIQA: Multi-dimension attention network for no-reference image quality assessment. In CVPRW, June 2022.
  77. Tree of thoughts: Deliberate problem solving with large language models. In NeurIPS, 2023.
  78. LAMM: Language-assisted multi-modal instruction-tuning dataset, framework, and benchmark. In NeurIPS, 2023.
  79. Descriptive image quality assessment in the wild, 2024a. URL https://arxiv.org/abs/2405.18842.
  80. Depicting beyond scores: Advancing image quality assessment through multi-modal language models. In ECCV, 2024b.
  81. Scaling up to excellence: Practicing model scaling for photo-realistic image restoration in the wild. In CVPR, June 2024.
  82. Crafting a toolchain for image restoration by deep reinforcement learning. In CVPR, June 2018.
  83. The shift from models to compound AI systems. https://bair.berkeley.edu/blog/2024/02/18/compound-ai-systems/, 2024.
  84. Multi-stage progressive image restoration. In CVPR, June 2021.
  85. Restormer: Efficient transformer for high-resolution image restoration. In CVPR, June 2022.
  86. Ingredient-oriented multi-degradation learning for image restoration. In CVPR, June 2023a.
  87. Beyond a Gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE TIP, 2017.
  88. Designing a practical degradation model for deep blind image super-resolution. In ICCV, October 2021.
  89. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, June 2018.
  90. Crafting training degradation distribution for the accuracy-generalization trade-off in real-world super-resolution. In ICML. PMLR, 2023b.
  91. Instruction-following evaluation for large language models, 2023.
  92. Image restoration for under-display camera. In CVPR, June 2021.
  93. Ghost in the Minecraft: Generally capable agents for open-world environments via large language models with text-based knowledge and memory, 2023. URL https://arxiv.org/abs/2305.17144.
  94. Karel Zuiderveld. VIII.5. - Contrast limited adaptive histogram equalization. In Graphics Gems, pp.  474–485. Academic Press, 1994.

Summary

We haven't generated a summary for this paper yet.