Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AGIQA-3K: An Open Database for AI-Generated Image Quality Assessment (2306.04717v2)

Published 7 Jun 2023 in cs.CV, cs.AI, and eess.IV

Abstract: With the rapid advancements of the text-to-image generative model, AI-generated images (AGIs) have been widely applied to entertainment, education, social media, etc. However, considering the large quality variance among different AGIs, there is an urgent need for quality models that are consistent with human subjective ratings. To address this issue, we extensively consider various popular AGI models, generated AGI through different prompts and model parameters, and collected subjective scores at the perceptual quality and text-to-image alignment, thus building the most comprehensive AGI subjective quality database AGIQA-3K so far. Furthermore, we conduct a benchmark experiment on this database to evaluate the consistency between the current Image Quality Assessment (IQA) model and human perception, while proposing StairReward that significantly improves the assessment performance of subjective text-to-image alignment. We believe that the fine-grained subjective scores in AGIQA-3K will inspire subsequent AGI quality models to fit human subjective perception mechanisms at both perception and alignment levels and to optimize the generation result of future AGI models. The database is released on https://github.com/lcysyzxdxc/AGIQA-3k-Database.

Overview of AGIQA-3K: An Open Database for AI-Generated Image Quality Assessment

The paper "AGIQA-3K: An Open Database for AI-Generated Image Quality Assessment" addresses the critical need for refined and reliable quality models in the rapidly advancing domain of AI-generated images (AGIs). The authors construct the AGIQA-3K database, which aims to improve the understanding and assessment of the subjective quality of AGIs. Given the large variability in image quality among different generative models and configurations, this paper provides a robust framework for evaluating the perceptual quality and text-to-image alignment of AGI.

AGIQA-3K notably includes 2,982 AGI samples generated using a diverse set of six different models. These models span the historically important GAN-based, auto-regressive, and the more recent diffusion-based models. Among these, Stable Diffusion and Midjourney represent the latest advancements in diffusion models, while AttnGAN and DALLE2 serve as benchmarks from earlier methodologies in GAN and auto-regressive paradigms, respectively. By incorporating such a broad spectrum of AGI techniques, AGIQA-3K aims to comprehensively cover the variations in image quality that arise from different architectural frameworks and parameter settings.

Subjective Quality Assessment Methodology

The paper implements a systematically structured subjective quality assessment approach, obtaining human perceptual ratings on both the perceptual quality and text-to-image alignment dimensions. The authors conducted this evaluation through controlled laboratory conditions, adhering to established subjective testing standards (ITU-R BT.500-13). Participants rated images in terms of Mean Opinion Score (MOS) for perceptual quality and alignment, which involved detailed examination to ensure high reliability and consistency of the results.

Key Contributions and Findings

  1. Comprehensive Database: The AGIQA-3K database is touted as the most extensive of its kind, capturing a wide range of AGI qualities by leveraging different generation models and adjusting key parameters such as Classifier Free Guidance (CFG) and training iterations.
  2. Quality Assessment Metrics: The authors critique existing Image Quality Assessment (IQA) metrics and propose an enhanced evaluation algorithm named StairReward. This model dissectively evaluates the text-to-image alignment, improving consistency with human subjective scores by refining how alignment is computed for individual components or "morphemes" of the input prompt.
  3. Impact of Model and Parameter Variability: Through systematic analysis, the paper outlines how different models and configurations influence AGI quality. For instance, lengthier and more complex prompts tend to lower both perceptual and alignment scores. Likewise, distinct styles (e.g., Baroque or Realistic vs. Abstract or Sci-fi) yield varying image qualities, indicating the influence of training data distributions in generative performance.
  4. Benchmark Experiments: Evaluations against established and new quality metrics show the proposed StairReward offering superior alignment prediction performance over other models. Meanwhile, available perception assessment models still show limitations, necessitating further optimization for tasks specific to AGI.

Implications and Future Directions

The insights drawn from AGIQA-3K are instrumental in informing the design of future AGI models and the strategies employed to evaluate them. The paper highlights the current gap in existing models to accurately predict aligned and perceptually relevant AGI outputs. Future works should focus on enhancing the discriminative power of perception models for subtle qualitative differences, particularly as AGI models evolve and the complexity of generated content increases.

The open nature of the AGIQA-3K database offers the research community a valuable resource to refine existing metrics and develop new methodologies, fostering development towards AGI models that are finely attuned to human standards of quality and alignment.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. “Adversarial text-to-image synthesis: A review,” Neural Networks, vol. 144, pp. 187–209, 2021.
  2. “Text-to-image diffusion model in generative ai: A survey,” arXiv preprint arXiv:2303.07909, 2023.
  3. “Rectified wasserstein generative adversarial networks for perceptual image restoration,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 3, pp. 3648–3663, 2023.
  4. “Generative adversarial text to image synthesis,” in International conference on machine learning. PMLR, 2016, pp. 1060–1069.
  5. “Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 5907–5915.
  6. “Attngan: Fine-grained text to image generation with attentional generative adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 1316–1324.
  7. “Cogview: Mastering text-to-image generation via transformers,” Advances in Neural Information Processing Systems, vol. 34, pp. 19822–19835, 2021.
  8. “Zero-shot text-to-image generation,” in International Conference on Machine Learning. PMLR, 2021, pp. 8821–8831.
  9. “Scaling autoregressive models for content-rich text-to-image generation,” arXiv preprint arXiv:2206.10789, 2022.
  10. “Glide: Towards photorealistic image generation and editing with text-guided diffusion models,” in International Conference on Machine Learning. PMLR, 2022, pp. 16784–16804.
  11. “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 10684–10695.
  12. “Text-guided synthesis of artistic images with retrieval-augmented diffusion models,” arXiv preprint arXiv:2207.13038, 2022.
  13. “A perceptual quality assessment exploration for aigc images,” arXiv preprint arXiv:2303.12618, 2023.
  14. “Imagereward: Learning and evaluating human preferences for text-to-image generation,” arXiv preprint arXiv:2304.05977, 2023.
  15. “Diffusiondb: A large-scale prompt gallery dataset for text-to-image generative models,” arXiv preprint arXiv:2210.14896, 2022.
  16. “Pick-a-pic: An open dataset of user preferences for text-to-image generation,” arXiv preprint arXiv:2305.01569, 2023.
  17. “Better aligning text-to-image models with human preference,” arXiv preprint arXiv:2303.14420, 2023.
  18. “Improved techniques for training gans,” Advances in neural information processing systems, vol. 29, 2016.
  19. “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” Advances in neural information processing systems, vol. 30, 2017.
  20. “Demystifying mmd gans,” arXiv preprint arXiv:1801.01401, 2018.
  21. “Learning to evaluate the artness of ai-generated images,” arXiv preprint arXiv:2305.04923, 2023.
  22. “Perceptual image quality assessment: a survey,” Science China Information Sciences, vol. 63, pp. 1–52, 2020.
  23. “A real-time blind quality-of-experience assessment metric for http adaptive streaming,” arXiv preprint arXiv:2303.09818, 2023.
  24. “Light-vqa: A multi-dimensional quality assessment model for low-light video enhancement,” arXiv preprint arXiv:2305.09512, 2023.
  25. “Vdpve: Vqa dataset for perceptual video enhancement,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1474–1483.
  26. “Blind quality assessment for in-the-wild images via hierarchical feature fusion and iterative mixed database training,” IEEE Journal of Selected Topics in Signal Processing, 2023.
  27. “Screen content quality assessment: overview, benchmark, and beyond,” ACM Computing Surveys (CSUR), vol. 54, no. 9, pp. 1–36, 2021.
  28. “Joint chroma downsampling and upsampling for screen content image,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 26, no. 9, pp. 1595–1609, 2016.
  29. “A full-reference quality assessment metric for cartoon images,” in 2022 IEEE 24th International Workshop on Multimedia Signal Processing (MMSP). IEEE, 2022, pp. 1–6.
  30. “Clipscore: A reference-free evaluation metric for image captioning,” arXiv preprint arXiv:2104.08718, 2021.
  31. Yixiong Chen, “X-iqe: explainable image quality evaluation for text-to-image generation with visual large language models,” arXiv preprint arXiv:2305.10843, 2023.
  32. “Clip-vip: Adapting pre-trained image-text model to video-language alignment,” in The Eleventh International Conference on Learning Representations, 2023.
  33. “Hierarchical text-conditional image generation with clip latents,” arXiv preprint arXiv:2204.06125, 2022.
  34. David Holz, “Midjourney,” https://www.midjourney.com/, 2023.
  35. “Koniq-10k: An ecologically valid database for deep learning of blind image quality assessment,” IEEE Transactions on Image Processing, vol. 29, pp. 4041–4056, 2020.
  36. Ali Borji, “Generated faces in the wild: Quantitative comparison of stable diffusion, midjourney and dall-e 2,” arXiv preprint arXiv:2210.00586, 2022.
  37. “The konstanz natural video database (konvid-1k),” in 2017 Ninth international conference on quality of multimedia experience (QoMEX). IEEE, 2017, pp. 1–6.
  38. “The creation and detection of deepfakes: A survey,” ACM Computing Surveys (CSUR), vol. 54, no. 1, pp. 1–41, 2021.
  39. I. T. Union, “Methodology for the subjective assessment of the quality of television pictures,” ITU-R Recommendation BT. 500-11, 2002.
  40. “Hierarchical feature aggregation based on transformer for image-text matching,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 9, pp. 6437–6447, 2022.
  41. “Discrete joint semantic alignment hashing for cross-modal image-text search,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 11, pp. 8022–8036, 2022.
  42. “Assessing visual quality of omnidirectional videos,” IEEE transactions on circuits and systems for video technology, vol. 29, no. 12, pp. 3516–3530, 2018.
  43. “Hybrid no-reference quality metric for singly and multiply distorted images,” IEEE Transactions on Broadcasting, vol. 60, no. 3, pp. 555–567, 2014.
  44. “Nima: Neural image assessment,” IEEE transactions on image processing, vol. 27, no. 8, pp. 3998–4011, 2018.
  45. “Image quality score distribution prediction via alpha stable model,” IEEE Transactions on Circuits and Systems for Video Technology, 2022.
  46. “D2former: Jointly learning hierarchical detectors and contextual descriptors via agent-based transformers,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 2904–2914.
  47. “Prompt-based learning for unpaired image captioning,” IEEE Transactions on Multimedia, 2023.
  48. “Visual-textual joint relevance learning for tag-based social image search,” IEEE Transactions on Image Processing, vol. 22, no. 1, pp. 363–376, 2012.
  49. “Models of word segmentation in fluent maternal speech to infants,” in Signal to syntax, pp. 129–146. Psychology Press, 2014.
  50. “A statistical evaluation of recent full reference image quality assessment algorithms,” IEEE Transactions on image processing, vol. 15, no. 11, pp. 3440–3451, 2006.
  51. “No-reference quality assessment of contrast-distorted images using contrast enhancement,” arXiv preprint arXiv:1904.08879, 2019.
  52. “A no-reference perceptual image sharpness metric based on a cumulative probability of blur detection,” in 2009 International Workshop on Quality of Multimedia Experience. IEEE, 2009, pp. 87–91.
  53. “Making a “completely blind” image quality analyzer,” IEEE Signal processing letters, vol. 20, no. 3, pp. 209–212, 2012.
  54. “Microsoft coco: Common objects in context,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer, 2014, pp. 740–755.
  55. “Blind image quality estimation via distortion aggravation,” IEEE Transactions on Broadcasting, vol. 64, no. 2, pp. 508–517, 2018.
  56. “Blind image quality assessment using joint statistics of gradient magnitude and laplacian features,” IEEE Transactions on Image Processing, vol. 23, no. 11, pp. 4850–4862, 2014.
  57. “Large-scale crowdsourced study for high dynamic range images,” IEEE Transactions on Image Processing, vol. 26, no. 10, pp. 4725–4740, 2017.
  58. “Blind image quality assessment using a deep bilinear convolutional neural network,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 1, pp. 36–47, 2018.
  59. “Exploring clip for assessing the look and feel of images,” arXiv preprint arXiv:2207.12396, 2022.
  60. “Convolutional neural networks for no-reference image quality assessment,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 1733–1740.
  61. “Blindly assess image quality in the wild guided by a self-adaptive hyper network,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 3667–3676.
  62. “Libsvm: a library for support vector machines,” ACM transactions on intelligent systems and technology (TIST), vol. 2, no. 3, pp. 1–27, 2011.
  63. “IQA-PyTorch: Pytorch toolbox for image quality assessment,” [Online]. Available: https://github.com/chaofengc/IQA-PyTorch, 2022.
  64. “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Chunyi Li (66 papers)
  2. Zicheng Zhang (124 papers)
  3. Haoning Wu (68 papers)
  4. Wei Sun (373 papers)
  5. Xiongkuo Min (138 papers)
  6. Xiaohong Liu (117 papers)
  7. Guangtao Zhai (230 papers)
  8. Weisi Lin (118 papers)
Citations (82)