Transfer CLIP for Generalizable Image Denoising (2403.15132v1)
Abstract: Image denoising is a fundamental task in computer vision. While prevailing deep learning-based supervised and self-supervised methods have excelled in eliminating in-distribution noise, their susceptibility to out-of-distribution (OOD) noise remains a significant challenge. The recent emergence of contrastive language-image pre-training (CLIP) model has showcased exceptional capabilities in open-world image recognition and segmentation. Yet, the potential for leveraging CLIP to enhance the robustness of low-level tasks remains largely unexplored. This paper uncovers that certain dense features extracted from the frozen ResNet image encoder of CLIP exhibit distortion-invariant and content-related properties, which are highly desirable for generalizable denoising. Leveraging these properties, we devise an asymmetrical encoder-decoder denoising network, which incorporates dense features including the noisy image and its multi-scale features from the frozen ResNet encoder of CLIP into a learnable image decoder to achieve generalizable denoising. The progressive feature augmentation strategy is further proposed to mitigate feature overfitting and improve the robustness of the learnable decoder. Extensive experiments and comparisons conducted across diverse OOD noises, including synthetic noise, real-world sRGB noise, and low-dose CT image noise, demonstrate the superior generalization ability of our method.
- A high-quality denoising dataset for smartphone cameras. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1692–1700, 2018.
- Ntire 2017 challenge on single image super-resolution: Dataset and study. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2017.
- American Association of Physicists in Medicine. Low dose CT grand challenge. https://www.aapm.org/grandchallenge/lowdosect/, 2016.
- Noise2self: Blind denoising by self-supervision. In International Conference on Machine Learning, pages 524–533. PMLR, 2019.
- External patch prior guided internal clustering for image denoising. In Proceedings of the IEEE international conference on computer vision, pages 603–611, 2015.
- Masked image training for generalizable deep image denoising. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1692–1703, 2023a.
- Multi-view self-supervised disentanglement for general image denoising. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12281–12291, 2023b.
- An empirical study of training self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9640–9649, 2021.
- Score priors guided deep variational inference for unsupervised real-world single image denoising. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12937–12948, 2023.
- Image denoising by sparse 3-d transform-domain collaborative filtering. IEEE Transactions on image processing, 16(8):2080–2095, 2007.
- An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2020.
- Learning invariant representation for unsupervised image restoration. In Proceedings of the ieee/cvf conference on computer vision and pattern recognition, pages 14483–14492, 2020.
- Generative diffusion prior for unified image restoration and enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9935–9946, 2023.
- Rich Franzen. Kodak lossless true color image suite. http://r0k.us/graphics/kodak/, 1999.
- Toward convolutional blind denoising of real photographs. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1712–1722, 2019.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Single image super-resolution from transformed self-exemplars. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5197–5206, 2015.
- Neighbor2neighbor: Self-supervised denoising from single noisy images. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14781–14790, 2021.
- Local 3d editing via 3d distillation of clip knowledge. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12674–12684, 2023.
- Let segment anything help image dehaze. arXiv preprint arXiv:2306.15870, 2023.
- Snips: Solving noisy inverse problems stochastically. Advances in Neural Information Processing Systems, 34:21757–21769, 2021.
- Denoising diffusion restoration models. 2022.
- Noise2score: tweedie’s approach to self-supervised image denoising without clean images. Advances in Neural Information Processing Systems, 34:864–874, 2021.
- Segment anything. arXiv preprint arXiv:2304.02643, 2023.
- Similarity of neural network representations revisited. In International conference on machine learning, pages 3519–3529. PMLR, 2019.
- Noise2void-learning denoising from single noisy images. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2129–2137, 2019.
- Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4681–4690, 2017.
- Ap-bsn: Self-supervised denoising for real-world images via asymmetric pd and blind-spot network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17725–17734, 2022.
- A simple feature augmentation for domain generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8886–8895, 2021.
- Sam-deblur: Let segment anything boost image deblurring. arXiv preprint arXiv:2309.02270, 2023a.
- Distilling large vision-language model with out-of-distribution generalizability. In ICCV, pages 2492–2503, 2023b.
- Learning distortion invariant representation for image restoration from a causality perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1714–1724, 2023c.
- Efficient and explicit modelling of image hierarchies for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18278–18289, 2023d.
- Open-vocabulary semantic segmentation with mask-adapted clip. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7061–7070, 2023.
- Swinir: Image restoration using swin transformer. In 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pages 1833–1844. IEEE Computer Society, 2021.
- Decoupled weight decay regularization. In International Conference on Learning Representations, 2018.
- Can sam boost video super-resolution? arXiv preprint arXiv:2305.06524, 2023.
- Segclip: Patch aggregation with learnable centers for open-vocabulary semantic segmentation. In International Conference on Machine Learning, pages 23033–23044. PMLR, 2023a.
- Controlling vision-language models for universal image restoration. arXiv preprint arXiv:2310.01018, 2023b.
- A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, pages 416–423. IEEE, 2001.
- Robust and interpretable blind image denoising via bias-free convolutional neural networks. In International Conference on Learning Representations, 2019.
- Adaptive denoising via gaintuning. Advances in neural information processing systems, 34:23727–23740, 2021.
- A holistic approach to cross-channel image noise modeling and its application to image denoising. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1683–1691, 2016.
- Cvf-sid: Cyclic multi-variate function for self-supervised image denoising by disentangling noise from image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17583–17591, 2022.
- Noise suppression with similarity-based self-supervised deep learning. IEEE Transactions on Medical Imaging, 42(6):1590–1602, 2023.
- Recorrupted-to-recorrupted: unsupervised deep learning for image denoising. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2043–2052, 2021.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Denseclip: Language-guided dense prediction with context-aware prompting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18082–18091, 2022.
- Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):4713–4726, 2022.
- Towards out-of-distribution generalization: A survey. arXiv preprint arXiv:2108.13624, 2021.
- Solving inverse problems in medical imaging with score-based generative models. In International Conference on Learning Representations, 2022.
- Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
- Lg-bpn: Local and global blind-patch network for self-supervised real-world denoising. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18156–18165, 2023.
- Tinyclip: Clip distillation via affinity mimicking and weight inheritance. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 21970–21980, 2023.
- A dive into sam prior in image restoration. arXiv preprint arXiv:2305.13620, 2023.
- Real-world noisy image denoising: A new benchmark. arXiv preprint arXiv:1804.02603, 2018.
- Towards adversarially robust deep image denoising. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, pages 1516–1522, 2022.
- Spae: Semantic pyramid autoencoder for multimodal generation with frozen llms. arXiv preprint arXiv:2306.17842, 2023.
- Restormer: Efficient transformer for high-resolution image restoration. arXiv preprint arXiv:2111.09881, 2021a.
- Multi-stage progressive image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14821–14831, 2021b.
- Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE transactions on image processing, 26(7):3142–3155, 2017.
- Plug-and-play image restoration with deep denoiser prior. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):6360–6376, 2021.
- Color demosaicking by local directional interpolation and nonlocal adaptive thresholding. Journal of Electronic imaging, 20(2):023016–023016, 2011.
- Can language understand depth? In Proceedings of the 30th ACM International Conference on Multimedia, pages 6868–6874, 2022.
- Residual non-local attention networks for image restoration. In International Conference on Learning Representations, 2019.
- Loss functions for image restoration with neural networks. IEEE Transactions on computational imaging, 3(1):47–57, 2016.
- Learn from unpaired data for image restoration: A variational bayes approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
- Extract free dense labels from clip. In European Conference on Computer Vision, pages 696–712. Springer, 2022a.
- Domain generalization: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022b.
- Zegclip: Towards adapting clip for zero-shot semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11175–11185, 2023.
- Denoising diffusion models for plug-and-play image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1219–1229, 2023.