SAM-DiffSR: Structure-Modulated Diffusion Model for Image Super-Resolution (2402.17133v1)
Abstract: Diffusion-based super-resolution (SR) models have recently garnered significant attention due to their potent restoration capabilities. But conventional diffusion models perform noise sampling from a single distribution, constraining their ability to handle real-world scenes and complex textures across semantic regions. With the success of segment anything model (SAM), generating sufficiently fine-grained region masks can enhance the detail recovery of diffusion-based SR model. However, directly integrating SAM into SR models will result in much higher computational cost. In this paper, we propose the SAM-DiffSR model, which can utilize the fine-grained structure information from SAM in the process of sampling noise to improve the image quality without additional computational cost during inference. In the process of training, we encode structural position information into the segmentation mask from SAM. Then the encoded mask is integrated into the forward diffusion process by modulating it to the sampled noise. This adjustment allows us to independently adapt the noise mean within each corresponding segmentation area. The diffusion model is trained to estimate this modulated noise. Crucially, our proposed framework does NOT change the reverse diffusion process and does NOT require SAM at inference. Experimental results demonstrate the effectiveness of our proposed method, showcasing superior performance in suppressing artifacts, and surpassing existing diffusion-based methods by 0.74 dB at the maximum in terms of PSNR on DIV2K dataset. The code and dataset are available at https://github.com/lose4578/SAM-DiffSR.
- Semantic segmentation guided real-world super-resolution. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 449–458, 2022.
- Ntire 2017 challenge on single image super-resolution: Dataset and study. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 126–135, 2017.
- Pre-trained image processing transformer. In CVPR, pp. 12299–12310, 2021.
- Activating more pixels in image super-resolution transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22367–22377, 2023.
- Second-order attention network for single image super-resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11065–11074, 2019.
- Learning a deep convolutional network for image super-resolution. In ECCV, pp. 184–199. Springer, 2014.
- Image super-resolution using deep convolutional networks. IEEE transactions on pattern analysis and machine intelligence, 38(2):295–307, 2015.
- Accelerating the super-resolution convolutional neural network. In European Conference on Computer Vision, pp. 391–407. Springer, 2016.
- Manga109 dataset and creation of metadata. In Proceedings of the 1st international workshop on comics analysis, processing and understanding, pp. 1–5, 2016.
- Controlling perceptual factors in neural style transfer. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3985–3993, 2017.
- A new deep generative network for unsupervised remote sensing single-image super-resolution. IEEE Transactions on Geoscience and Remote sensing, 56(11):6792–6810, 2018.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020a.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020b.
- Single image super-resolution from transformed self-exemplars. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5197–5206, 2015.
- Simultaneous super-resolution and cross-modality synthesis of 3d medical images using weakly-supervised joint convolutional sparse coding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6070–6079, 2017.
- Efficient and accurate quantized image super-resolution on mobile npus, mobile ai & aim 2022 challenge: report. In ECCV, pp. 92–129. Springer, 2022.
- Super resolution techniques for medical image processing. In 2015 International Conference on Technologies for Sustainable Development (ICTSD), pp. 1–6. IEEE, 2015.
- Perceptual losses for real-time style transfer and super-resolution. In European Conference on Computer Vision, pp. 694–711. Springer, 2016.
- Accurate image super-resolution using very deep convolutional networks. In CVPR, pp. 1646–1654, 2016.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Segment anything. arXiv preprint arXiv:2304.02643, 2023.
- Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4681–4690, 2017.
- Srdiff: Single image super-resolution with diffusion probabilistic models. Neurocomputing, 479:47–59, 2022.
- On efficient transformer and image pre-training for low-level vision. arXiv preprint arXiv:2112.10175, 3(7):8, 2021a.
- Best-buddy gans for highly detailed image super-resolution. arXiv preprint arXiv:2103.15295, 2021b.
- Diffusion models for image restoration and enhancement–a comprehensive survey. arXiv preprint arXiv:2308.09388, 2023.
- Swinir: Image restoration using swin transformer. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 1833–1844, 2021a.
- Hierarchical conditional flow: A unified framework for image super-resolution and image rescaling. In IEEE International Conference on Computer Vision, 2021b.
- Vrt: A video restoration transformer. arXiv preprint arXiv:2201.12288, 2022a.
- Details or artifacts: A locally discriminative learning approach to realistic image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5657–5666, 2022b.
- Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 136–144, 2017.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 10012–10022, 2021.
- Can sam boost video super-resolution? arXiv preprint arXiv:2305.06524, 2023.
- Structure-preserving super resolution with gradient guidance. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7769–7778, 2020.
- A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, volume 2, pp. 416–423. IEEE, 2001.
- Image super-resolution with non-local sparse attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3517–3526, 2021.
- Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014.
- Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pp. 8162–8171. PMLR, 2021.
- Video deblurring via semantic segmentation and pixel-wise non-linear kernel. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1077–1085, 2017.
- High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
- Palette: Image-to-image diffusion models. In ACM SIGGRAPH, 2022a.
- Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):4713–4726, 2022b.
- Resdiff: Combining cnn and diffusion model for image super-resolution. arXiv preprint arXiv:2303.08714, 2023.
- Roformer: Enhanced transformer with rotary position embedding. arXiv preprint arXiv:2104.09864, 2021.
- Maxim: Multi-axis mlp for image processing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5769–5780, 2022.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- A comprehensive review on deep learning based remote sensing image super-resolution methods. Earth-Science Reviews, pp. 104110, 2022a.
- Recovering realistic texture in image super-resolution by deep spatial feature transform. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 606–615, 2018a.
- Esrgan: Enhanced super-resolution generative adversarial networks. In European Conference on Computer Vision, pp. 0–0, 2018b.
- Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 1905–1914, 2021.
- Flickr1024: A large-scale dataset for stereo image super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp. 0–0, 2019.
- Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004.
- Uformer: A general u-shaped transformer for image restoration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 17683–17693, 2022b.
- Diffir: Efficient diffusion model for image restoration. arXiv preprint arXiv:2303.09472, 2023.
- A dive into sam prior in image restoration. arXiv preprint arXiv:2305.13620, 2023.
- Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5728–5739, 2022.
- On single image scale-up using sparse-representations. In Curves and Surfaces: 7th International Conference, Avignon, France, June 24-30, 2010, Revised Selected Papers 7, pp. 711–730. Springer, 2012.
- Faster segment anything: Towards lightweight sam for mobile applications. arXiv preprint arXiv:2306.14289, 2023.
- Deep unfolding network for image super-resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3217–3226, 2020.
- Designing a practical degradation model for deep blind image super-resolution. In IEEE International Conference on Computer Vision, pp. 4791–4800, 2021.
- Image super-resolution using very deep residual channel attention networks. In European Conference on Computer Vision, pp. 286–301, 2018a.
- Residual dense network for image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2472–2481, 2018b.
- Chengcheng Wang (14 papers)
- Zhiwei Hao (16 papers)
- Yehui Tang (63 papers)
- Jianyuan Guo (40 papers)
- Yujie Yang (29 papers)
- Kai Han (184 papers)
- Yunhe Wang (145 papers)