Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Analysis and a New Strategy

Published 1 Apr 2020 in eess.IV and cs.CV | (2004.00448v2)

Abstract: Data augmentation is an effective way to improve the performance of deep networks. Unfortunately, current methods are mostly developed for high-level vision tasks (e.g., classification) and few are studied for low-level vision tasks (e.g., image restoration). In this paper, we provide a comprehensive analysis of the existing augmentation methods applied to the super-resolution task. We find that the methods discarding or manipulating the pixels or features too much hamper the image restoration, where the spatial relationship is very important. Based on our analyses, we propose CutBlur that cuts a low-resolution patch and pastes it to the corresponding high-resolution image region and vice versa. The key intuition of CutBlur is to enable a model to learn not only "how" but also "where" to super-resolve an image. By doing so, the model can understand "how much", instead of blindly learning to apply super-resolution to every given pixel. Our method consistently and significantly improves the performance across various scenarios, especially when the model size is big and the data is collected under real-world environments. We also show that our method improves other low-level vision tasks, such as denoising and compression artifact removal.

Abstract PDF Upgrade to Chat

Authors (3)

Citations (145)

View on Semantic Scholar

Summary

Comprehensive Analysis and a Novel Approach for Data Augmentation in Image Super-Resolution

In the domain of image super-resolution (SR), this paper presents a critical reassessment of existing data augmentation (DA) methodologies alongside a new augmentation technique named CutBlur. The researchers aim to elevate the performance of SR models, which have traditionally relied on synthetic datasets and geometric transformations, by introducing an innovative DA strategy that addresses specific challenges inherent to low-level vision tasks.

Current Limitations and Novel Proposition

Current DA methods, largely developed for high-level tasks like classification, are not directly transferable to low-level vision problems such as SR, where the spatial coherence and pixel relationships are crucial. The paper identifies that DA techniques, which significantly disrupt pixel relationships or create sharp transitional boundaries, often degrade the performance of SR models. As existing methods typically introduce unrealistic pixel alterations, which adversely affect model generalization in real-world scenarios, the authors investigate a solution aimed at preserving spatial integrity while still offering DA benefits.

The novel approach proposed is the CutBlur method, which involves swapping low- and high-resolution patches of the same image to induce a blended input. This allows the SR models to learn contextual cues on "how", "where", and "how much" to super-resolve an image, effectively regularizing models against over-sharpening artifacts and enhancing performance.

Methodology and Key Results

CutBlur is framed to avoid the over-specialization problem by promoting adaptive learning in SR networks. This is achieved by confining transformations within the original image context, ensuring better spatial and structural learning without external noise introduction. The performance enhancements by CutBlur are significant, particularly in scenarios with limited data or with large model architectures, showcasing improvements in both synthetic (DIV2K) and real-world (RealSR) datasets.

Throughout the experiments, CutBlur demonstrates not only quantitative improvements in PSNR and SSIM but also qualitative advancements, reflected in fewer distortions and more accurate artifact handling in real-world applications, such as out-of-focus scenarios and SR tasks with unseen scale factors.

Implications and Potential Extensions

The study establishes CutBlur as a crucial tool for advancing SR tasks, particularly in environments where gathering extensive real-world datasets is infeasible. Furthermore, its application extends beyond SR to other low-level vision tasks like denoising and JPEG artifact removal, indicating its robustness and versatility.

The implications of this work are broad; it suggests that DA strategies incorporating contextual consistency can significantly enhance model training outcomes. For future work, the integration of CutBlur with other augmentation methods was proposed as a Mixture of Augmentations approach, hinting at further possibilities for expanding model capabilities across varied vision tasks.

Conclusion

This paper underscores the necessity of tailored DA methods for low-level vision tasks, particularly SR, presenting CutBlur as a compelling solution. The meticulous evaluation of DA methods within the scope of SR, alongside the introduction of a logically consistent yet effective augmentation strategy, manifests strong contributions toward better-performing, generalizable image restoration models. As AI continues to bridge the gap between synthetic simulations and real-world complexities, such advancements are invaluable, paving the way for further exploration in domain-specific DA strategies.

Markdown Report Issue