Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Frequency Separation for Real-World Super-Resolution (1911.07850v1)

Published 18 Nov 2019 in eess.IV and cs.CV

Abstract: Most of the recent literature on image super-resolution (SR) assumes the availability of training data in the form of paired low resolution (LR) and high resolution (HR) images or the knowledge of the downgrading operator (usually bicubic downscaling). While the proposed methods perform well on standard benchmarks, they often fail to produce convincing results in real-world settings. This is because real-world images can be subject to corruptions such as sensor noise, which are severely altered by bicubic downscaling. Therefore, the models never see a real-world image during training, which limits their generalization capabilities. Moreover, it is cumbersome to collect paired LR and HR images in the same source domain. To address this problem, we propose DSGAN to introduce natural image characteristics in bicubically downscaled images. It can be trained in an unsupervised fashion on HR images, thereby generating LR images with the same characteristics as the original images. We then use the generated data to train a SR model, which greatly improves its performance on real-world images. Furthermore, we propose to separate the low and high image frequencies and treat them differently during training. Since the low frequencies are preserved by downsampling operations, we only require adversarial training to modify the high frequencies. This idea is applied to our DSGAN model as well as the SR model. We demonstrate the effectiveness of our method in several experiments through quantitative and qualitative analysis. Our solution is the winner of the AIM Challenge on Real World SR at ICCV 2019.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Manuel Fritsche (4 papers)
  2. Shuhang Gu (56 papers)
  3. Radu Timofte (299 papers)
Citations (203)

Summary

  • The paper introduces DSGAN that splits image frequencies to focus adversarial training on high-frequency details while preserving low frequencies.
  • It adapts ESRGAN with a frequency separation framework to improve image quality by maintaining color fidelity and reducing noise artifacts.
  • Quantitative and qualitative evaluations using LPIPS and real-world tests demonstrate significant improvements over conventional super-resolution methods.

Frequency Separation for Improved Real-World Image Super-Resolution

Introduction

The work by Fritsche, Gu, and Timofte addresses a critical limitation in contemporary image super-resolution (SR) techniques, where assumptions about available training data constrain the models. Traditional SR methods rely on paired low-resolution (LR) and high-resolution (HR) images or assume knowledge of the downgrading operator, commonly using bicubic downscaling. While these methods perform well on benchmark datasets, they falter with real-world images, which can present various corruptions, such as sensor noise, not accounted for during training.

Proposed Solution

The authors introduce DSGAN (DownSampleGAN), an unsupervised technique to integrate realistic image characteristics into downscaled images. Unlike traditional methods, DSGAN does not require paired data and instead achieves superior generalization to real-world conditions. The key innovation involves separating image frequencies and only applying adversarial training to modify the high frequencies, while the low frequencies remain largely unaltered due to their relative insensitivity to downscaling operations. This selective approach streamlines the adversarial task, assists in faster convergence, and maintains color fidelity in the output SR images.

Methodology

DSGAN is structured around neural networks enhanced by a GAN setup, where a generator creates LR images that mimic the high-frequency characteristics of HR images in natural scenes. By amortizing the learning task into low and high frequencies, the method reduces complexity and focuses the GAN's discriminators on relevant features. The framework uses a conjunction of multiple loss functions—color, perceptual, and texture/adversarial—to maintain fidelity in the low frequencies and dispel artifacts in the high frequencies.

Further, the researchers adapt the proposed frequency separation method to the super-resolution model itself, specifically modifying ESRGAN, to operate on these distinct frequency bands. This modification addresses the instability often found in training with adversarial losses, as networks attempt to map corrupted input LR images to perceptually clear SR outputs.

Evaluation

The efficacy of this approach was validated both on datasets with synthetic degradations, such as Gaussian sensor noise and JPEG artifacts, and on real-world datasets with actual photographic corruptions. Quantitative results showed improvements in metrics like LPIPS, which correlates well with human perception, indicating higher-quality output images when compared with existing state-of-the-art. Qualitative assessments showcased the models' ability to suppress noise and maintain detail, overcoming significant issues faced by baseline SR models like patchy artifacts and undesired noise amplification.

Implications and Future Work

The proposed DSGAN and its integration with frequency-separated ESRGAN present a useful toolset for practical applications in image clarity enhancement, notably in fields relying on real-world imaging, such as mobile photography. By disaggregating image processing tasks into subset frequency bands, the work simplifies the conversion to HR images.

Further exploration into dynamic filtering and real-time implementation might capitalize on computational refinements highlighted in this research. Extensions to other domains could involve exploring frequency separation concepts in video applications for temporal consistency and smoother transitions between frames.

Conclusion

In summary, the method triumphs in aligning computational strategies for SR with natural image properties, a notable progression towards practical utility of SR models. This research not only addresses the limitations of conventional approaches in handling real-world corruptions but also opens avenues for future exploration in frequency-conscious modeling.