Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Large Multi-modality Model Assisted AI-Generated Image Quality Assessment (2404.17762v2)

Published 27 Apr 2024 in cs.CV

Abstract: Traditional deep neural network (DNN)-based image quality assessment (IQA) models leverage convolutional neural networks (CNN) or Transformer to learn the quality-aware feature representation, achieving commendable performance on natural scene images. However, when applied to AI-Generated images (AGIs), these DNN-based IQA models exhibit subpar performance. This situation is largely due to the semantic inaccuracies inherent in certain AGIs caused by uncontrollable nature of the generation process. Thus, the capability to discern semantic content becomes crucial for assessing the quality of AGIs. Traditional DNN-based IQA models, constrained by limited parameter complexity and training data, struggle to capture complex fine-grained semantic features, making it challenging to grasp the existence and coherence of semantic content of the entire image. To address the shortfall in semantic content perception of current IQA models, we introduce a large Multi-modality model Assisted AI-Generated Image Quality Assessment (MA-AGIQA) model, which utilizes semantically informed guidance to sense semantic information and extract semantic vectors through carefully designed text prompts. Moreover, it employs a mixture of experts (MoE) structure to dynamically integrate the semantic information with the quality-aware features extracted by traditional DNN-based IQA models. Comprehensive experiments conducted on two AI-generated content datasets, AIGCQA-20k and AGIQA-3k show that MA-AGIQA achieves state-of-the-art performance, and demonstrate its superior generalization capabilities on assessing the quality of AGIs. Code is available at https://github.com/wangpuyi/MA-AGIQA.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023).
  2. Training Diffusion Models with Reinforcement Learning. In The Twelfth International Conference on Learning Representations.
  3. PixArt-α𝛼\alphaitalic_α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis. 2310.00426.
  4. DeepFloyd. 2023. IF-I-XL-v1.0. https://www.deepfloyd.ai.
  5. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
  6. dreamlike art. 2023. dreamlike-photoreal-2.0. https://dreamlike.art.
  7. LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model. arXiv preprint arXiv:2304.15010 (2023).
  8. Progressive Knowledge Distillation Of Stable Diffusion XL Using Layer Level Loss. 2401.02677.
  9. Deep Residual Learning for Image Recognition. arXiv:1512.03385 [cs.CV]
  10. Denoising Diffusion Probabilistic Models. arXiv:2006.11239 [cs.LG]
  11. David Holz. 2023. Midjourney. https://www.midjourney.com.
  12. Multilayer feedforward networks are universal approximators. Neural networks 2, 5 (1989), 359–366.
  13. Convolutional neural networks for no-reference image quality assessment. In CVPR. 1733–1740.
  14. Musiq: Multi-scale image quality transformer. In Proceedings of the IEEE/CVF international conference on computer vision. 5148–5157.
  15. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv:1412.6980
  16. Multimodal Foundation Models: From Specialists to General-Purpose Assistants. arXiv:2309.10020 [cs.CV]
  17. AIGIQA-20K: A Large Database for AI-Generated Image Quality Assessment. arXiv:2404.03407 [cs.CV]
  18. A PBCH Reception Algorithm in 5G Broadcasting. In IEEE International Symposium on Broadband Multimedia Systems and Broadcasting.
  19. Q-Refine: A Perceptual Quality Refiner for AI-Generated Image. arXiv:2401.01117
  20. AGIQA-3K: An Open Database for AI-Generated Image Quality Assessment. arXiv:2306.04717 [cs.CV]
  21. Visual Instruction Tuning.
  22. LCM-LoRA: A Universal Stable-Diffusion Acceleration Module. arXiv:2311.05556 [cs.CV]
  23. Rectified Wasserstein Generative Adversarial Networks for Perceptual Image Restoration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 3 (2023), 3648–3663. https://doi.org/10.1109/TPAMI.2022.3185316
  24. No-reference image quality assessment in the spatial domain. IEEE Transactions on image processing 21, 12 (2012), 4695–4708.
  25. Making a “completely blind” image quality analyzer. IEEE Signal processing letters 20, 3 (2012), 209–212.
  26. Anush Krishna Moorthy and Alan Conrad Bovik. 2011. Blind image quality assessment: From natural scene statistics to perceptual quality. IEEE transactions on Image Processing 20, 12 (2011), 3350–3364.
  27. GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models. In International Conference on Machine Learning. PMLR, 16784–16804.
  28. OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
  29. PlaygroundAI. 2023. playground-v2-1024px-aesthetic. https://playground.com.
  30. Bringing Textual Prompt to AI-Generated Image Quality Assessment. arXiv:2403.18714 [cs.CV]
  31. Learning Transferable Visual Models From Natural Language Supervision. arXiv:2103.00020 [cs.CV]
  32. Hierarchical Text-Conditional Image Generation with CLIP Latents. 2204.06125.
  33. Zero-shot text-to-image generation. In International Conference on Machine Learning. PMLR, 8821–8831.
  34. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10684–10695.
  35. Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models. 2207.13038.
  36. Blind image quality assessment: A natural scene statistics approach in the DCT domain. IEEE transactions on Image Processing 21, 8 (2012), 3339–3352.
  37. Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556 [cs.CV]
  38. Koniq++: Boosting no-reference image quality assessment in the wild by jointly predicting image quality and defects. In The 32nd British Machine Vision Conference.
  39. Blindly assess image quality in the wild guided by a self-adaptive hyper network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3667–3676.
  40. Blind quality assessment for in-the-wild images via hierarchical feature fusion and iterative mixed database training. IEEE Journal of Selected Topics in Signal Processing (2023).
  41. Going Deeper with Convolutions. arXiv:1409.4842 [cs.CV]
  42. Exploring clip for assessing the look and feel of images. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 2555–2563.
  43. Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level Vision. arXiv:2309.14181
  44. Q-instruct: Improving low-level visual abilities for multi-modality foundation models. arXiv:2311.06783
  45. Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels. arXiv:2312.17090
  46. Towards Open-ended Visual Quality Comparison. arXiv preprint arXiv:2402.16641.
  47. A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment. arXiv:2403.10854 [cs.CV]
  48. Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1316–1324.
  49. MANIQA: Multi-dimension Attention Network for No-Reference Image Quality Assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1191–1200.
  50. mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration. arXiv:2311.04257 [cs.CL]
  51. Towards Artistic Image Aesthetics Assessment: A Large-Scale Dataset and a New Method. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 22388–22397.
  52. Junyong You and Jari Korhonen. 2021. Transformer for image quality assessment. In 2021 IEEE International Conference on Image Processing (ICIP). IEEE, 1389–1393.
  53. Guangtao Zhai and Xiongkuo Min. 2020. Perceptual image quality assessment: a survey. Science China Information Sciences 63 (2020), 1–52.
  54. A feature-enriched completely blind image quality evaluator. IEEE Transactions on Image Processing 24, 8 (2015), 2579–2591.
  55. Blind image quality assessment using a deep bilinear convolutional neural network. IEEE Transactions on Circuits and Systems for Video Technology 30, 1 (2018), 36–47.
  56. Fine-grained image quality assessment: A revisit and further thinking. IEEE Transactions on Circuits and Systems for Video Technology 32, 5 (2021), 2746–2759.
  57. Q-Boost: On Visual Quality Assessment Ability of Low-level Multi-Modality Foundation Models. arXiv:2312.15300
  58. Quality-of-Experience Evaluation for Digital Twins in 6G Network Environments. IEEE Transactions on Broadcasting (2024).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Puyi Wang (3 papers)
  2. Wei Sun (373 papers)
  3. Zicheng Zhang (124 papers)
  4. Jun Jia (35 papers)
  5. Yanwei Jiang (8 papers)
  6. Zhichao Zhang (32 papers)
  7. Xiongkuo Min (139 papers)
  8. Guangtao Zhai (231 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.