Referring Flexible Image Restoration (2404.10342v1)
Abstract: In reality, images often exhibit multiple degradations, such as rain and fog at night (triple degradations). However, in many cases, individuals may not want to remove all degradations, for instance, a blurry lens revealing a beautiful snowy landscape (double degradations). In such scenarios, people may only desire to deblur. These situations and requirements shed light on a new challenge in image restoration, where a model must perceive and remove specific degradation types specified by human commands in images with multiple degradations. We term this task Referring Flexible Image Restoration (RFIR). To address this, we first construct a large-scale synthetic dataset called RFIR, comprising 153,423 samples with the degraded image, text prompt for specific degradation removal and restored image. RFIR consists of five basic degradation types: blur, rain, haze, low light and snow while six main sub-categories are included for varying degrees of degradation removal. To tackle the challenge, we propose a novel transformer-based multi-task model named TransRFIR, which simultaneously perceives degradation types in the degraded image and removes specific degradation upon text prompt. TransRFIR is based on two devised attention modules, Multi-Head Agent Self-Attention (MHASA) and Multi-Head Agent Cross Attention (MHACA), where MHASA and MHACA introduce the agent token and reach the linear complexity, achieving lower computation cost than vanilla self-attention and cross-attention and obtaining competitive performances. Our TransRFIR achieves state-of-the-art performances compared with other counterparts and is proven as an effective architecture for image restoration. We release our project at https://github.com/GuanRunwei/FIR-CP.
- Dense-haze: A benchmark for image dehazing with dense-haze and haze-free images, in: 2019 IEEE international conference on image processing (ICIP), IEEE. pp. 1014–1018.
- Retinexformer: One-stage retinex-based transformer for low-light image enhancement. arXiv preprint arXiv:2303.06705 .
- Pre-trained image processing transformer, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12299–12310.
- Simple baselines for image restoration, in: European Conference on Computer Vision, Springer. pp. 17–33.
- Ipdnet: A dual convolutional network combined with image prior for single image dehazing. Engineering Applications of Artificial Intelligence 126, 106782.
- Always clear days: Degradation type and severity aware all-in-one adverse weather removal. arXiv preprint arXiv:2310.18293 .
- Image super-resolution with text prompt diffusion. arXiv preprint arXiv:2311.14282 .
- Rethinking coarse-to-fine approach in single image deblurring, in: Proceedings of the IEEE/CVF international conference on computer vision, pp. 4641–4650.
- Rethinking attention with performers, in: International Conference on Learning Representations.
- High-quality image restoration following human instructions. arXiv preprint arXiv:2401.16468 .
- FlashAttention: Fast and memory-efficient exact attention with IO-awareness, in: Advances in Neural Information Processing Systems.
- An image is worth 16x16 words: Transformers for image recognition at scale, in: International Conference on Learning Representations.
- Removing rain from single images via a deep detail network, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3855–3863.
- Watervg: Waterway visual grounding based on text-guided vision and mmwave radar. arXiv preprint arXiv:2403.12686 .
- Adaptir: Parameter efficient multi-task adaptation for pre-trained image restoration models. arXiv preprint arXiv:2312.08881 .
- R2rnet: Low-light image enhancement via real-low to real-normal network. Journal of Visual Communication and Image Representation 90, 103712.
- Flatten transformer: Vision transformer using focused linear attention, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5961–5971.
- Agent attention: On the integration of softmax and linear attention. arXiv preprint arXiv:2312.08874 .
- Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778.
- Long short-term memory. Neural computation 9, 1735–1780.
- A comprehensive survey of deep learning for image captioning. ACM Computing Surveys (CsUR) 51, 1–36.
- Multi-task learning using uncertainty to weigh losses for scene geometry and semantics, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7482–7491.
- Bert: Pre-training of deep bidirectional transformers for language understanding, in: Proceedings of NAACL-HLT, pp. 4171–4186.
- Albert: A lite bert for self-supervised learning of language representations, in: International Conference on Learning Representations.
- All-in-one image restoration for unknown corruption, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17452–17462.
- Benchmarking single-image dehazing and beyond. IEEE Transactions on Image Processing 28, 492–505.
- All in one bad weather removal using architectural search, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3175–3185.
- Efficient and explicit modelling of image hierarchies for image restoration, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18278–18289.
- Swinir: Image restoration using swin transformer, in: Proceedings of the IEEE/CVF international conference on computer vision, pp. 1833–1844.
- Iterative prompt learning for unsupervised backlit image enhancement, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8094–8103.
- Tape: Task-agnostic prior embedding for image restoration, in: European Conference on Computer Vision, Springer. pp. 447–464.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 .
- From synthetic to real: Image dehazing collaborating with unlabeled real data, in: Proceedings of the 29th ACM international conference on multimedia, pp. 50–58.
- Desnownet: Context-aware deep network for snow removal. IEEE Transactions on Image Processing 27, 3064–3073.
- Controlling vision-language models for universal image restoration. arXiv preprint arXiv:2310.01018 .
- Deep multi-scale convolutional neural network for dynamic scene deblurring, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3883–3891.
- Restoring vision in adverse weather conditions with patch-based denoising diffusion models. IEEE Transactions on Pattern Analysis and Machine Intelligence .
- Glove: Global vectors for word representation, in: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543.
- Promptir: Prompting for all-in-one blind image restoration. arXiv preprint arXiv:2306.13090 .
- Tip: Text-driven image processing with semantic and restoration instructions. arXiv preprint arXiv:2312.11595 .
- Referring expression comprehension: A survey of methods and datasets. IEEE Transactions on Multimedia 23, 4426–4440.
- Adaptive consistency prior based deep network for image denoising, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8596–8606.
- Automold. URL: https://github.com/UjjwalSaxena/Automold--Road-Augmentation-Library.
- Drivelm: Driving with graph visual question answering. arXiv preprint arXiv:2312.14150 .
- Banet: a blur-aware attention network for dynamic scene deblurring. IEEE Transactions on Image Processing 31, 6789–6799.
- Multimodal research in vision and language: A review of current and emerging trends. Information Fusion 77, 149–171.
- Transweather: Transformer-based restoration of images degraded by adverse weather conditions, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2353–2363.
- Attention is all you need. Advances in neural information processing systems 30.
- Gridformer: Residual dense transformer with grid structure for image restoration in adverse weather conditions. arXiv preprint arXiv:2305.17863 .
- Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 600–612.
- Uformer: A general u-shaped transformer for image restoration, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 17683–17693.
- Deep retinex decomposition for low-light enhancement, in: British Machine Vision Conference.
- Referring multi-object tracking, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14633–14642.
- Language prompt for autonomous driving. arXiv preprint arXiv:2309.04379 .
- Unified-width adaptive dynamic network for all-in-one image restoration. arXiv preprint arXiv:2401.13221 .
- Joint rain detection and removal via iterative region dependent multi-task learning. CoRR, abs/1609.07769 2, 1–12.
- Scaling up to excellence: Practicing model scaling for photo-realistic image restoration in the wild. arXiv preprint arXiv:2401.13627 .
- Metaformer is actually what you need for vision, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10819–10829.
- Restormer: Efficient transformer for high-resolution image restoration, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5728–5739.
- The unreasonable effectiveness of deep features as a perceptual metric, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 586–595.
- A unified conditional framework for diffusion-based image restoration. arXiv preprint arXiv:2305.20049 .
- Learning to prompt for vision-language models. International Journal of Computer Vision 130, 2337–2348.
- Learning weather-general and weather-specific features for image restoration under multiple adverse weather conditions, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21747–21758.