PIPAL: a Large-Scale Image Quality Assessment Dataset for Perceptual Image Restoration (2007.12142v2)

Published 23 Jul 2020 in eess.IV and cs.CV

Abstract: Image quality assessment (IQA) is the key factor for the fast development of image restoration (IR) algorithms. The most recent IR methods based on Generative Adversarial Networks (GANs) have achieved significant improvement in visual performance, but also presented great challenges for quantitative evaluation. Notably, we observe an increasing inconsistency between perceptual quality and the evaluation results. Then we raise two questions: (1) Can existing IQA methods objectively evaluate recent IR algorithms? (2) When focus on beating current benchmarks, are we getting better IR algorithms? To answer these questions and promote the development of IQA methods, we contribute a large-scale IQA dataset, called Perceptual Image Processing Algorithms (PIPAL) dataset. Especially, this dataset includes the results of GAN-based methods, which are missing in previous datasets. We collect more than 1.13 million human judgments to assign subjective scores for PIPAL images using the more reliable "Elo system". Based on PIPAL, we present new benchmarks for both IQA and super-resolution methods. Our results indicate that existing IQA methods cannot fairly evaluate GAN-based IR algorithms. While using appropriate evaluation methods is important, IQA methods should also be updated along with the development of IR algorithms. At last, we improve the performance of IQA networks on GAN-based distortions by introducing anti-aliasing pooling. Experiments show the effectiveness of the proposed method.

PDF Abstract

Overview of the PIPAL Dataset for Perceptual Image Quality Assessment

The paper introduces the PIPAL dataset, a large-scale collection designed for assessing image quality, particularly in the context of perceptual image restoration. It addresses the challenges posed by emerging image restoration (IR) techniques, particularly those based on Generative Adversarial Networks (GANs), which have complicated the objective evaluation due to an inconsistency between quantitative metrics and human perceptual judgment.

Content and Motivation

Image restoration aims to generate high-quality images from degraded inputs. While deep learning has substantially advanced this field, the evaluation of IR methods has lagged, often relying on outdated metrics like PSNR and SSIM that do not align well with human perception, especially for GAN-generated outputs. The PIPAL dataset is proposed as a remedy, furnishing a comprehensive resource that includes GAN-induced distortions alongside traditional and deep-learning-based outputs.

Dataset Composition and Methodology

PIPAL comprises 29,000 images featuring 250 high-quality references with 40 distortion types, crucially including GAN-based outputs. This is the first dataset of its scale to consider such outputs, addressing the previous oversight in datasets like LIVE and TID. For subjective quality scoring, PIPAL employs the Elo rating system, garnering reliability from over 1.13 million human judgments to produce Mean Opinion Score (MOS). This methodology facilitates dataset extension, enabling the easy inclusion of future IR developments.

Benchmarks and Findings

The paper constructs a benchmark for IQA methods using PIPAL. Traditional metrics exhibit poor correlation with human judgments on GAN-based outputs, verifying the hypothesis regarding their ineffectiveness for modern IR algorithms. Comparatively, algorithms like LPIPS, PieAPP, and DISTS, particularly with deep-learning components, demonstrate higher correlations with human perception, highlighting their potential for GAN-based IR evaluation. Despite their relative success, the need for improved IQA metrics remains apparent, as none fully captures the nuances of perceptual quality.

Simultaneously, the paper scrutinizes IR methods by creating a benchmark for super-resolution algorithms (e.g., SRCNN, ESRGAN, RankSRGAN). It uncovers that while perceptual innovations (such as those from GANs) offer superior visual quality over traditional methods, they are inadequately assessed by existing IQA benchmarks. The perceptual advancements are substantiated by progressive MOS but not by conventional metrics.

Implications for Future Research

This research signifies an important step toward more cohesive evaluation mechanisms that align with perceptual intelligence, reaffirming the necessity to adapt or reinvent IQA metrics parallel to IR advancements. It suggests possible improvements through anti-aliasing techniques that enhance the alignment robustness of convolutional features, thereby improving performance on GAN-associated distortions.

Conclusion

The introduction of PIPAL paves the way for more reliable assessments of perceptual image restoration, particularly under the GAN paradigm. Future research should direct efforts towards refining and possibly reengineering IQA methods to integrate the complexities introduced by advanced IR techniques, fostering a pipeline of innovation from algorithm development to perceptual evaluation.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Jinjin Gu (56 papers)
Haoming Cai (17 papers)
Haoyu Chen (71 papers)
Xiaoxing Ye (2 papers)
Jimmy Ren (32 papers)
Chao Dong (168 papers)

Citations (166)

View on Semantic Scholar