Bringing Textual Prompt to AI-Generated Image Quality Assessment (2403.18714v2)
Abstract: AI-Generated Images (AGIs) have inherent multimodal nature. Unlike traditional image quality assessment (IQA) on natural scenarios, AGIs quality assessment (AGIQA) takes the correspondence of image and its textual prompt into consideration. This is coupled in the ground truth score, which confuses the unimodal IQA methods. To solve this problem, we introduce IP-IQA (AGIs Quality Assessment via Image and Prompt), a multimodal framework for AGIQA via corresponding image and prompt incorporation. Specifically, we propose a novel incremental pretraining task named Image2Prompt for better understanding of AGIs and their corresponding textual prompts. An effective and efficient image-prompt fusion module, along with a novel special [QA] token, are also applied. Both are plug-and-play and beneficial for the cooperation of image and its corresponding prompt. Experiments demonstrate that our IP-IQA achieves the state-of-the-art on AGIQA-1k and AGIQA-3k datasets. Code will be available at https://github.com/Coobiw/IP-IQA.
- “Generative adversarial nets,” NeurIPS, vol. 27, 2014.
- “Cogview: Mastering text-to-image generation via transformers,” NeurIPS, vol. 34, pp. 19822–19835, 2021.
- “High-resolution image synthesis with latent diffusion models,” in CVPR, 2022, pp. 10684–10695.
- “Blindly assess image quality in the wild guided by a self-adaptive hyper network,” in CVPR, 2020, pp. 3667–3676.
- “Blind Image Quality Assessment Using a Deep Bilinear Convolutional Neural Network,” TCSVT, vol. 30, no. 1, pp. 36–47, Jan. 2020.
- “Learning transferable visual models from natural language supervision,” in ICML. PMLR, 2021, pp. 8748–8763.
- “Diffusiondb: A large-scale prompt gallery dataset for text-to-image generative models,” arXiv preprint, 2022.
- “A perceptual quality assessment exploration for aigc images,” arXiv preprint, 2023.
- “Agiqa-3k: An open database for ai-generated image quality assessment,” arXiv preprint, 2023.
- “Improved techniques for training gans,” NeurIPS, vol. 29, 2016.
- “Clipscore: A reference-free evaluation metric for image captioning,” arXiv preprint, 2021.
- “Convolutional neural networks for no-reference image quality assessment,” in CVPR, 2014, pp. 1733–1740.
- “Exploring clip for assessing the look and feel of images,” in AAAI, 2023, vol. 37, pp. 2555–2563.
- “Blind image quality assessment via vision-language correspondence: A multitask learning perspective,” in IEEE Conference on Computer Vision and Pattern Recognition, 2023, pp. 14071–14081.
- “Norm-in-norm loss with faster convergence and better performance for image quality assessment,” in Proceedings of the 28th ACM International Conference on Multimedia, New York, NY, USA, 2020, MM ’20, p. 789–797, Association for Computing Machinery.
- “No-reference quality assessment of contrast-distorted images using contrast enhancement,” arXiv preprint, 2019.
- “Making a “completely blind” image quality analyzer,” IEEE SPL, vol. 20, no. 3, pp. 209–212, 2012.
- “A no-reference perceptual image sharpness metric based on a cumulative probability of blur detection,” in QoMEX. IEEE, 2009, pp. 87–91.
- “Hybrid no-reference quality metric for singly and multiply distorted images,” IEEE TOB, vol. 60, no. 3, pp. 555–567, 2014.
- “Blind image quality assessment using joint statistics of gradient magnitude and laplacian features,” IEEE TIP, vol. 23, no. 11, pp. 4850–4862, 2014.
- “Large-scale crowdsourced study for tone-mapped hdr pictures,” IEEE TIP, vol. 26, no. 10, pp. 4725–4740, 2017.
- “Deep residual learning for image recognition,” in CVPR, June 2016.
- “Imagereward: Learning and evaluating human preferences for text-to-image generation,” arXiv preprint, 2023.
- “Better aligning text-to-image models with human preference,” arXiv preprint, 2023.