AGIQA-3K: An Open Database for AI-Generated Image Quality Assessment (2306.04717v2)

Published 7 Jun 2023 in cs.CV, cs.AI, and eess.IV

Abstract: With the rapid advancements of the text-to-image generative model, AI-generated images (AGIs) have been widely applied to entertainment, education, social media, etc. However, considering the large quality variance among different AGIs, there is an urgent need for quality models that are consistent with human subjective ratings. To address this issue, we extensively consider various popular AGI models, generated AGI through different prompts and model parameters, and collected subjective scores at the perceptual quality and text-to-image alignment, thus building the most comprehensive AGI subjective quality database AGIQA-3K so far. Furthermore, we conduct a benchmark experiment on this database to evaluate the consistency between the current Image Quality Assessment (IQA) model and human perception, while proposing StairReward that significantly improves the assessment performance of subjective text-to-image alignment. We believe that the fine-grained subjective scores in AGIQA-3K will inspire subsequent AGI quality models to fit human subjective perception mechanisms at both perception and alignment levels and to optimize the generation result of future AGI models. The database is released on https://github.com/lcysyzxdxc/AGIQA-3k-Database.

PDF HTML Abstract

Overview of AGIQA-3K: An Open Database for AI-Generated Image Quality Assessment

The paper "AGIQA-3K: An Open Database for AI-Generated Image Quality Assessment" addresses the critical need for refined and reliable quality models in the rapidly advancing domain of AI-generated images (AGIs). The authors construct the AGIQA-3K database, which aims to improve the understanding and assessment of the subjective quality of AGIs. Given the large variability in image quality among different generative models and configurations, this paper provides a robust framework for evaluating the perceptual quality and text-to-image alignment of AGI.

AGIQA-3K notably includes 2,982 AGI samples generated using a diverse set of six different models. These models span the historically important GAN-based, auto-regressive, and the more recent diffusion-based models. Among these, Stable Diffusion and Midjourney represent the latest advancements in diffusion models, while AttnGAN and DALLE2 serve as benchmarks from earlier methodologies in GAN and auto-regressive paradigms, respectively. By incorporating such a broad spectrum of AGI techniques, AGIQA-3K aims to comprehensively cover the variations in image quality that arise from different architectural frameworks and parameter settings.

Subjective Quality Assessment Methodology

The paper implements a systematically structured subjective quality assessment approach, obtaining human perceptual ratings on both the perceptual quality and text-to-image alignment dimensions. The authors conducted this evaluation through controlled laboratory conditions, adhering to established subjective testing standards (ITU-R BT.500-13). Participants rated images in terms of Mean Opinion Score (MOS) for perceptual quality and alignment, which involved detailed examination to ensure high reliability and consistency of the results.

Key Contributions and Findings

Comprehensive Database: The AGIQA-3K database is touted as the most extensive of its kind, capturing a wide range of AGI qualities by leveraging different generation models and adjusting key parameters such as Classifier Free Guidance (CFG) and training iterations.
Quality Assessment Metrics: The authors critique existing Image Quality Assessment (IQA) metrics and propose an enhanced evaluation algorithm named StairReward. This model dissectively evaluates the text-to-image alignment, improving consistency with human subjective scores by refining how alignment is computed for individual components or "morphemes" of the input prompt.
Impact of Model and Parameter Variability: Through systematic analysis, the paper outlines how different models and configurations influence AGI quality. For instance, lengthier and more complex prompts tend to lower both perceptual and alignment scores. Likewise, distinct styles (e.g., Baroque or Realistic vs. Abstract or Sci-fi) yield varying image qualities, indicating the influence of training data distributions in generative performance.
Benchmark Experiments: Evaluations against established and new quality metrics show the proposed StairReward offering superior alignment prediction performance over other models. Meanwhile, available perception assessment models still show limitations, necessitating further optimization for tasks specific to AGI.

Implications and Future Directions

The insights drawn from AGIQA-3K are instrumental in informing the design of future AGI models and the strategies employed to evaluate them. The paper highlights the current gap in existing models to accurately predict aligned and perceptually relevant AGI outputs. Future works should focus on enhancing the discriminative power of perception models for subtle qualitative differences, particularly as AGI models evolve and the complexity of generated content increases.

The open nature of the AGIQA-3K database offers the research community a valuable resource to refine existing metrics and develop new methodologies, fostering development towards AGI models that are finely attuned to human standards of quality and alignment.

PDF Markdown Bookmark Chat (Pro)

References (64)

Authors (8)

Chunyi Li (66 papers)
Zicheng Zhang (124 papers)
Haoning Wu (68 papers)
Wei Sun (373 papers)
Xiongkuo Min (138 papers)
Xiaohong Liu (117 papers)
Guangtao Zhai (230 papers)
Weisi Lin (118 papers)

Citations (82)

View on Semantic Scholar

GitHub

GitHub - lcysyzxdxc/AGIQA-3k-Database: [IEEE TCSVT2023] A Fine-grained Subjective Perception & Alignment Database for AI Generated Image Quality Assessment (65 stars)