Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Human Guided Ground-truth Generation for Realistic Image Super-resolution (2303.13069v1)

Published 23 Mar 2023 in cs.CV

Abstract: How to generate the ground-truth (GT) image is a critical issue for training realistic image super-resolution (Real-ISR) models. Existing methods mostly take a set of high-resolution (HR) images as GTs and apply various degradations to simulate their low-resolution (LR) counterparts. Though great progress has been achieved, such an LR-HR pair generation scheme has several limitations. First, the perceptual quality of HR images may not be high enough, limiting the quality of Real-ISR outputs. Second, existing schemes do not consider much human perception in GT generation, and the trained models tend to produce over-smoothed results or unpleasant artifacts. With the above considerations, we propose a human guided GT generation scheme. We first elaborately train multiple image enhancement models to improve the perceptual quality of HR images, and enable one LR image having multiple HR counterparts. Human subjects are then involved to annotate the high quality regions among the enhanced HR images as GTs, and label the regions with unpleasant artifacts as negative samples. A human guided GT image dataset with both positive and negative samples is then constructed, and a loss function is proposed to train the Real-ISR models. Experiments show that the Real-ISR models trained on our dataset can produce perceptually more realistic results with less artifacts. Dataset and codes can be found at https://github.com/ChrisDud0257/HGGT

Citations (14)

Summary

  • The paper introduces a human-guided ground-truth generation method that enhances perceptual quality in image super-resolution training.
  • It leverages diverse CNN and transformer architectures along with human annotations to build a dataset of 20,193 groups and 80,772 enhanced patches.
  • Trained models show improved qualitative metrics, with 78.72% positive human evaluations, demonstrating the method’s practical impact on Real-ISR.

Human Guided Ground-truth Generation for Realistic Image Super-resolution

The paper "Human Guided Ground-truth Generation for Realistic Image Super-resolution" proposes a novel methodology for generating ground-truth (GT) data used to train realistic image super-resolution (Real-ISR) models. The research addresses significant limitations in current LR-HR pair generation techniques, which traditionally use high-resolution images as GTs without adequately considering human perception, thereby potentially constraining the perceptual quality of model outputs. The limitations of existing datasets often result in trained ISR models producing over-smoothed outputs or outputs with visual artifacts.

The authors introduce a human-guided GT generation approach to enhance the perceptual quality utilized in training. They begin by training various image enhancement models with multiple high-quality photorealistic datasets to augment the perceptual quality of existing HR images. The models feature different backbone architectures (e.g., CNN and transformer models) and different levels of degradation handling, ensuring a diverse enhancement pool for the HR images. Subsequently, volunteers are enlisted to annotate high-quality areas in enhanced HR images as positive GTs and regions with artifacts as negative samples.

Through this process, a comprehensive human-guided GT dataset is constructed with clearly labeled positive and negative samples to aid in the development of more functional loss functions specifically tailored for Real-ISR model training. The developed dataset includes 20,193 groups with 80,772 enhanced patches. Upon evaluation of these samples by human experts, a high percentage (78.72%) were marked as "Positive," showcasing the effectiveness of the authors' enhancement process.

The implications for practice are substantial. Training models using the proposed dataset demonstrated a marked improvement in qualitative assessments over those trained with traditional datasets such as DF2K-OST. Perceptual metrics such as LPIPS and DISTS showed impressive enhancements when comparing results from various state-of-the-art Real-ISR models retrained using the new dataset. This supports the assertion that the inclusion of human perception in GT selection can significantly refine generated model outputs, potentially leading to advances in the application of ISR in real-world scenarios where visual fidelity is paramount.

Moving forward, the methodological innovation in GT dataset creation posed in this paper could encourage broader adoption of human-centric approaches for other problems where perceptual quality is crucial. Furthermore, potential future developments could see refinements in automatic annotation systems for cumbersome large-scale datasets by harnessing artificial intelligence methodologies.

In conclusion, the paper offers strong empirical support for the role of human perception in guiding the data generation processes of AI models, presenting a method that allows for the creation of perceptually realistic data that dramatically improves the fidelity of Real-ISR model outputs. This could signify a step forward in closing the gap between synthetic simulation and real-world applicability in image processing models.