Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bringing Textual Prompt to AI-Generated Image Quality Assessment (2403.18714v2)

Published 27 Mar 2024 in cs.CV and cs.MM

Abstract: AI-Generated Images (AGIs) have inherent multimodal nature. Unlike traditional image quality assessment (IQA) on natural scenarios, AGIs quality assessment (AGIQA) takes the correspondence of image and its textual prompt into consideration. This is coupled in the ground truth score, which confuses the unimodal IQA methods. To solve this problem, we introduce IP-IQA (AGIs Quality Assessment via Image and Prompt), a multimodal framework for AGIQA via corresponding image and prompt incorporation. Specifically, we propose a novel incremental pretraining task named Image2Prompt for better understanding of AGIs and their corresponding textual prompts. An effective and efficient image-prompt fusion module, along with a novel special [QA] token, are also applied. Both are plug-and-play and beneficial for the cooperation of image and its corresponding prompt. Experiments demonstrate that our IP-IQA achieves the state-of-the-art on AGIQA-1k and AGIQA-3k datasets. Code will be available at https://github.com/Coobiw/IP-IQA.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (24)
  1. “Generative adversarial nets,” NeurIPS, vol. 27, 2014.
  2. “Cogview: Mastering text-to-image generation via transformers,” NeurIPS, vol. 34, pp. 19822–19835, 2021.
  3. “High-resolution image synthesis with latent diffusion models,” in CVPR, 2022, pp. 10684–10695.
  4. “Blindly assess image quality in the wild guided by a self-adaptive hyper network,” in CVPR, 2020, pp. 3667–3676.
  5. “Blind Image Quality Assessment Using a Deep Bilinear Convolutional Neural Network,” TCSVT, vol. 30, no. 1, pp. 36–47, Jan. 2020.
  6. “Learning transferable visual models from natural language supervision,” in ICML. PMLR, 2021, pp. 8748–8763.
  7. “Diffusiondb: A large-scale prompt gallery dataset for text-to-image generative models,” arXiv preprint, 2022.
  8. “A perceptual quality assessment exploration for aigc images,” arXiv preprint, 2023.
  9. “Agiqa-3k: An open database for ai-generated image quality assessment,” arXiv preprint, 2023.
  10. “Improved techniques for training gans,” NeurIPS, vol. 29, 2016.
  11. “Clipscore: A reference-free evaluation metric for image captioning,” arXiv preprint, 2021.
  12. “Convolutional neural networks for no-reference image quality assessment,” in CVPR, 2014, pp. 1733–1740.
  13. “Exploring clip for assessing the look and feel of images,” in AAAI, 2023, vol. 37, pp. 2555–2563.
  14. “Blind image quality assessment via vision-language correspondence: A multitask learning perspective,” in IEEE Conference on Computer Vision and Pattern Recognition, 2023, pp. 14071–14081.
  15. “Norm-in-norm loss with faster convergence and better performance for image quality assessment,” in Proceedings of the 28th ACM International Conference on Multimedia, New York, NY, USA, 2020, MM ’20, p. 789–797, Association for Computing Machinery.
  16. “No-reference quality assessment of contrast-distorted images using contrast enhancement,” arXiv preprint, 2019.
  17. “Making a “completely blind” image quality analyzer,” IEEE SPL, vol. 20, no. 3, pp. 209–212, 2012.
  18. “A no-reference perceptual image sharpness metric based on a cumulative probability of blur detection,” in QoMEX. IEEE, 2009, pp. 87–91.
  19. “Hybrid no-reference quality metric for singly and multiply distorted images,” IEEE TOB, vol. 60, no. 3, pp. 555–567, 2014.
  20. “Blind image quality assessment using joint statistics of gradient magnitude and laplacian features,” IEEE TIP, vol. 23, no. 11, pp. 4850–4862, 2014.
  21. “Large-scale crowdsourced study for tone-mapped hdr pictures,” IEEE TIP, vol. 26, no. 10, pp. 4725–4740, 2017.
  22. “Deep residual learning for image recognition,” in CVPR, June 2016.
  23. “Imagereward: Learning and evaluating human preferences for text-to-image generation,” arXiv preprint, 2023.
  24. “Better aligning text-to-image models with human preference,” arXiv preprint, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Bowen Qu (9 papers)
  2. Haohui Li (2 papers)
  3. Wei Gao (203 papers)
Citations (4)