Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DeSRA: Detect and Delete the Artifacts of GAN-based Real-World Super-Resolution Models (2307.02457v1)

Published 5 Jul 2023 in cs.CV, cs.AI, and cs.MM

Abstract: Image super-resolution (SR) with generative adversarial networks (GAN) has achieved great success in restoring realistic details. However, it is notorious that GAN-based SR models will inevitably produce unpleasant and undesirable artifacts, especially in practical scenarios. Previous works typically suppress artifacts with an extra loss penalty in the training phase. They only work for in-distribution artifact types generated during training. When applied in real-world scenarios, we observe that those improved methods still generate obviously annoying artifacts during inference. In this paper, we analyze the cause and characteristics of the GAN artifacts produced in unseen test data without ground-truths. We then develop a novel method, namely, DeSRA, to Detect and then Delete those SR Artifacts in practice. Specifically, we propose to measure a relative local variance distance from MSE-SR results and GAN-SR results, and locate the problematic areas based on the above distance and semantic-aware thresholds. After detecting the artifact regions, we develop a finetune procedure to improve GAN-based SR models with a few samples, so that they can deal with similar types of artifacts in more unseen real data. Equipped with our DeSRA, we can successfully eliminate artifacts from inference and improve the ability of SR models to be applied in real-world scenarios. The code will be available at https://github.com/TencentARC/DeSRA.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Toward real-world single image super-resolution: A new benchmark and a new model. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  3086–3095, 2019.
  2. Basicvsr: The search for essential components in video super-resolution and beyond. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  4947–4956, 2021.
  3. Basicvsr++: Improving video super-resolution with enhanced propagation and alignment. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  5972–5981, 2022a.
  4. Investigating tradeoffs in real-world video super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  5962–5971, 2022b.
  5. Pre-trained image processing transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  12299–12310, 2021.
  6. Activating more pixels in image super-resolution transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  22367–22377, June 2023.
  7. Second-order attention network for single image super-resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  11065–11074, 2019.
  8. A continual learning survey: Defying forgetting in classification tasks. IEEE transactions on pattern analysis and machine intelligence, 44(7):3366–3385, 2021.
  9. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp.  248–255, 2009.
  10. Learning a deep convolutional network for image super-resolution. In European conference on computer vision, pp.  184–199. Springer, 2014.
  11. Image super-resolution using deep convolutional networks. IEEE transactions on pattern analysis and machine intelligence, 38(2):295–307, 2015.
  12. Accelerating the super-resolution convolutional neural network. In European conference on computer vision, pp.  391–407. Springer, 2016.
  13. Fourier space losses for efficient perceptual image super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  2360–2369, 2021.
  14. Blind super-resolution with iterative kernel correction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  15. Video super-resolution via bidirectional recurrent convolutional networks. IEEE transactions on pattern analysis and machine intelligence, 40(4):1015–1028, 2017.
  16. Unfolding the alternating optimization for blind super resolution. Advances in Neural Information Processing Systems, 33:5632–5643, 2020.
  17. Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision, pp.  694–711. Springer, 2016.
  18. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  4681–4690, 2017.
  19. On efficient transformer and image pre-training for low-level vision, 2021.
  20. Swinir: Image restoration using swin transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  1833–1844, 2021.
  21. Vrt: A video restoration transformer. arXiv preprint arXiv:2201.12288, 2022a.
  22. Details or artifacts: A locally discriminative learning approach to realistic image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  5657–5666, 2022b.
  23. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp.  136–144, 2017.
  24. Accelerating the training of video super-resolution. arXiv preprint arXiv:2205.05069, 2022.
  25. Ntire 2020 challenge on real-world image super-resolution: Methods and results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp.  494–495, 2020.
  26. Structure-preserving super resolution with gradient guidance. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  7769–7778, 2020.
  27. Making a “completely blind” image quality analyzer. IEEE Signal processing letters, 20(3):209–212, 2012.
  28. Metric learning based interactive modulation for real-world super-resolution. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XVII, pp.  723–740. Springer, 2022.
  29. Single image super-resolution via a holistic attention network. In European conference on computer vision, pp.  191–207. Springer, 2020.
  30. Srobb: Targeted perceptual loss for single image super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  2710–2719, 2019.
  31. Rethinking alignment in video super-resolution transformers. arXiv preprint arXiv:2207.08494, 2022.
  32. Wada, K. Labelme: Image Polygonal Annotation with Python. URL https://github.com/wkentaro/labelme.
  33. Unsupervised degradation representation learning for blind super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  10581–10590, June 2021a.
  34. Dual-camera super-resolution with aligned attention modules. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp.  2001–2010, October 2021b.
  35. Recovering realistic texture in image super-resolution by deep spatial feature transform. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  606–615, 2018a.
  36. Esrgan: Enhanced super-resolution generative adversarial networks. In Proceedings of the European conference on computer vision (ECCV) workshops, pp.  0–0, 2018b.
  37. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  1905–1914, 2021c.
  38. Repsr: Training efficient vgg-style super-resolution networks with structural re-parameterization and batch normalization. In Proceedings of the 30th ACM International Conference on Multimedia, pp.  2556–2564, 2022.
  39. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004.
  40. Segformer: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems, 34:12077–12090, 2021a.
  41. Finding discriminative filters for specific degradations in blind super-resolution. Advances in Neural Information Processing Systems, 34:51–61, 2021b.
  42. Mitigating artifacts in real-world video super-resolution models. arXiv preprint arXiv:2212.07339, 2022.
  43. Maniqa: Multi-dimension attention network for no-reference image quality assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  1191–1200, 2022.
  44. Learning a single convolutional super-resolution network for multiple degradations. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  3262–3271, 2018a.
  45. Deep unfolding network for image super-resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  3217–3226, 2020.
  46. Designing a practical degradation model for deep blind image super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  4791–4800, 2021.
  47. Perceptual artifacts localization for inpainting. In European Conference on Computer Vision, pp.  146–164. Springer, 2022.
  48. Ranksrgan: Generative adversarial networks with ranker for image super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  3096–3105, 2019.
  49. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European conference on computer vision (ECCV), pp.  286–301, 2018b.
  50. Residual dense network for image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  2472–2481, 2018c.
Citations (19)

Summary

  • The paper introduces DeSRA, a novel method that detects and removes GAN-induced artifacts by comparing MSE-SR outputs with GAN-SR results.
  • It employs a two-step process combining local variance analysis and iterative fine-tuning to construct pseudo ground-truth for artifact correction.
  • Empirical validation demonstrates over 75% artifact reduction, high IoU, and improved perceptual quality in real-world super-resolution tasks.

An Analysis of DeSRA: Addressing GAN-Induced Artifacts in Image Super-Resolution

The paper "DeSRA: Detect and Delete the Artifacts of GAN-based Real-World Super-Resolution Models" by Xie et al. provides an in-depth paper and methodological advancement in addressing a critical issue in the domain of super-resolution (SR) using Generative Adversarial Networks (GANs). Despite their ability to generate visually appealing high-resolution images, GAN-based SR models often produce undesirable artifacts, especially when applied to real-world, unseen data. This paper presents a novel approach named DeSRA, which detects and removes such artifacts, enhancing the application potential of GAN-SR models in practical environments.

Overview of GAN-induced Artifacts in Image Super-Resolution

Single image super-resolution (SISR) seeks to generate high-resolution (HR) images from low-resolution (LR) counterparts. While non-GAN methods often fail to reproduce fine textures, GAN-based SR models excel at generating detailed images. However, they are prone to introducing perceptually unpleasant artifacts during both the training and inference stages. More problematically, these GAN-inference artifacts are typically out-of-distribution and emerge only during the processing of unseen real-world data.

Methodological Innovations in DeSRA

DeSRA addresses the GAN-inference artifacts through a two-step process. It begins with artifact detection by measuring the relative local variance distance between MSE-based SR results and GAN-SR outputs. Using a combination of local texture differences and semantic-aware thresholds, this method effectively identifies regions marred by artifacts. DeSRA then employs a fine-tuning strategy to iteratively improve the GAN-SR model using a limited dataset. By replacing artifact regions in GAN-SR outputs with MSE-SR results, a pseudo ground-truth set is constructed, enabling the model to generalize and reduce artifacts in unseen data.

Empirical Validation and Results

Experiments conducted with Real-ESRGAN and LDL models demonstrate the efficacy of DeSRA in artifact detection and reduction. The method achieves high Intersection over Union (IoU), precision, and recall metrics in detecting artifacts across various datasets. Significantly, visual and quantitative results indicate that DeSRA effectively reduces artifact presence, confirmed by both technical metrics and human perceptual studies. Post-processing with DeSRA leads to an artifact removal rate of over 75% on evaluated datasets, while simultaneously preventing additional artifacts.

Theoretical Implications and Future Research Directions

DeSRA's methodology of leveraging MSE-SR results as a reference brings a valuable perspective on addressing artifacts without actual ground-truth data, which is often unavailable in real-world scenarios. The work underlines the potential of adaptive fine-tuning methods in improving the robustness of GAN-based systems. Future research may extend DeSRA's concepts, potentially exploring more sophisticated unsupervised or self-supervised learning techniques to better handle diverse real-world degradations and unseen artifacts.

Conclusion

The research presented in this paper marks an important development in enhancing the operational efficiency of GAN-based super-resolution models in real-world applications. By systematically identifying and mitigating GAN-induced artifacts, DeSRA serves as a pivotal step forward, promoting the practical deployment of SR technology across diverse and complex environmental settings. As the field progresses, integrating continual learning paradigms with methodologies like DeSRA could further solidify the bridge between high-fidelity image generation and robust, artifact-free output in real-world applications.

Github Logo Streamline Icon: https://streamlinehq.com