Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
91 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
o3 Pro
5 tokens/sec
GPT-4.1 Pro
37 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
Gemini 2.5 Flash Deprecated
12 tokens/sec
2000 character limit reached

Global-Local Image Perceptual Score (GLIPS): Evaluating Photorealistic Quality of AI-Generated Images (2405.09426v2)

Published 15 May 2024 in cs.CV

Abstract: This paper introduces the Global-Local Image Perceptual Score (GLIPS), an image metric designed to assess the photorealistic image quality of AI-generated images with a high degree of alignment to human visual perception. Traditional metrics such as FID and KID scores do not align closely with human evaluations. The proposed metric incorporates advanced transformer-based attention mechanisms to assess local similarity and Maximum Mean Discrepancy (MMD) to evaluate global distributional similarity. To evaluate the performance of GLIPS, we conducted a human study on photorealistic image quality. Comprehensive tests across various generative models demonstrate that GLIPS consistently outperforms existing metrics like FID, SSIM, and MS-SSIM in terms of correlation with human scores. Additionally, we introduce the Interpolative Binning Scale (IBS), a refined scaling method that enhances the interpretability of metric scores by aligning them more closely with human evaluative standards. The proposed metric and scaling approach not only provides more reliable assessments of AI-generated images but also suggest pathways for future enhancements in image generation technologies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Everypixel Journal. (2024) Everypixel Journal. [Online]. Available: https://journal.everypixel.com/ai-image-statistics
  2. OpenAI. (2021) DALL-E: Creating Images from Text. [Online]. Available: https://cdn.openai.com/papers/dall-e-3.pdf
  3. Tech Report. (2024) Tech Report. [Online]. Available: https://techreport.com/statistics/ai-image-generator-market-statistics/
  4. H. Talebi and P. Milanfar, “Nima: Neural image assessment,” IEEE transactions on image processing, 2018.
  5. B. Degardin, V. Lopes, and H. Proença, “Fake it till you recognize it: Quality assessment for human action generative models,” IEEE Transactions on Biometrics, Behavior, and Identity Science, 2024.
  6. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004.
  7. K. Wang, M. Doneva, J. Meineke, T. Amthor, E. Karasan, F. Tan, J. I. Tamir, S. X. Yu, and M. Lustig, “High-fidelity direct contrast synthesis from magnetic resonance fingerprinting,” Magnetic Resonance in Medicine, 2023.
  8. H. Talebi and P. Milanfar, “Learned perceptual image enhancement,” in 2018 IEEE international conference on computational photography.   IEEE, 2018.
  9. M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” Advances in neural information processing systems, vol. 30, 2017.
  10. T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen, and X. Chen, “Improved techniques for training gans,” in Advances in Neural Information Processing Systems, D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, Eds.   Curran Associates, Inc., 2016.
  11. A. Rehman, K. Zeng, and Z. Wang, “Display device-adapted video quality-of-experience assessment,” in Human vision and electronic imaging XX.   SPIE, 2015.
  12. Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” in The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, vol. 2.   Ieee, 2003, pp. 1398–1402.
  13. H. Sheikh and A. Bovik, “Image information and visual quality,” IEEE Transactions on Image Processing, 2006.
  14. M. Bińkowski, D. J. Sutherland, M. Arbel, and A. Gretton, “Demystifying mmd gans,” arXiv preprint arXiv:1801.01401, 2018.
  15. R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018.
  16. C. N. Valdebenito Maturana, A. L. Sandoval Orozco, and L. J. García Villalba, “Exploration of metrics and datasets to assess the fidelity of images generated by generative adversarial networks,” Applied Sciences, 2023.
  17. R. Luna, I. Zabaleta, and M. Bertalmío, “State-of-the-art image and video quality assessment with a metric based on an intrinsically non-linear neural summation model,” Frontiers in Neuroscience, 2023.
  18. D. Varga, “Saliency-guided local full-reference image quality assessment,” Signals, 2022.
  19. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” 2021.
  20. A. Nichol, P. Dhariwal, A. Ramesh, P. Shyam, P. Mishkin, B. McGrew, I. Sutskever, and M. Chen, “Glide: Towards photorealistic image generation and editing with text-guided diffusion models,” arXiv preprint arXiv:2112.10741, 2021.
  21. S. Lee, B. Hoover, H. Strobelt, Z. J. Wang, S. Peng, A. Wright, K. Li, H. Park, H. Yang, and D. H. Chau, “Diffusion explainer: Visual explanation for text-to-image stable diffusion,” arXiv preprint arXiv:2305.03509, 2023.
  22. Google. (2021) DeepMind Imagen. [Online]. Available: https://deepmind.google/technologies/imagen-2/
  23. A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, and I. Sutskever, “Zero-shot text-to-image generation,” in International Conference on Machine Learning.   PMLR, 2021, pp. 8821–8831.
  24. A. Nichol, P. Dhariwal, A. Ramesh, P. Shyam, P. Mishkin, B. McGrew, I. Sutskever, and M. Chen, “Glide: Towards photorealistic image generation and editing with text-guided diffusion models,” 2022.
  25. C. Li, Z. Zhang, H. Wu, W. Sun, X. Min, X. Liu, G. Zhai, and W. Lin, “Agiqa-3k: An open database for ai-generated image quality assessment,” IEEE Transactions on Circuits and Systems for Video Technology, 2023.
  26. Z. Lu, D. Huang, L. Bai, J. Qu, C. Wu, X. Liu, and W. Ouyang, “Seeing is not always believing: Benchmarking human and model perception of ai-generated images,” Advances in Neural Information Processing Systems, vol. 36, 2024.
  27. M. Ragot, N. Martin, and S. Cojean, “Ai-generated vs. human artworks. a perception bias towards artificial intelligence?” in Extended abstracts of the 2020 CHI conference on human factors in computing systems, 2020, pp. 1–10.
  28. Y. Zhou and H. Kawabata, “Eyes can tell: Assessment of implicit attitudes toward ai art,” i-Perception, vol. 14, no. 5, p. 20416695231209846, 2023.
  29. M. S. Treder, R. Codrai, and K. A. Tsvetanov, “Quality assessment of anatomical mri images from generative adversarial networks: Human assessment and image quality metrics,” Journal of Neuroscience Methods, 2022.
  30. K. Sarkar, L. Liu, V. Golyanik, and C. Theobalt, “Humangan: A generative model of human images,” in 2021 International Conference on 3D Vision (3DV).   IEEE, 2021, pp. 258–267.
  31. R. Rassin, S. Ravfogel, and Y. Goldberg, “Dalle-2 is seeing double: flaws in word-to-concept mapping in text2image models,” arXiv preprint arXiv:2210.10606, 2022.
  32. L. Sun, M. Wei, Y. Sun, Y. J. Suh, L. Shen, and S. Yang, “Smiling women pitching down: auditing representational and presentational gender biases in image-generative ai,” Journal of Computer-Mediated Communication, 2024.
  33. K. Han, Y. Wang, H. Chen, X. Chen, J. Guo, and e. a. Liu, “A survey on vision transformer,” IEEE transactions on pattern analysis and machine intelligence, 2022.
  34. Prolific, “Prolific - online participant recruitment for surveys and market research,” https://www.prolific.co, 2024, accessed on YYYY-MM-DD.
  35. T. Lin, M. Maire, S. J. Belongie, L. D. Bourdev, R. B. Girshick, J. Hays, P. Perona, D. Ramanan, P. Doll’a r, and C. L. Zitnick, “Microsoft COCO: common objects in context,” CoRR, vol. abs/1405.0312, 2014. [Online]. Available: http://arxiv.org/abs/1405.0312
  36. G. Marcus, E. Davis, and S. Aaronson, “A very preliminary analysis of dall-e 2,” 2022.
  37. J. Betker, G. Goh, L. Jing, T. Brooks, J. Wang, L. Li, L. Ouyang, J. Zhuang, J. Lee, Y. Guo et al., “Improving image generation with better captions,” Computer Science. https://cdn. openai. com/papers/dall-e-3. pdf, vol. 2, p. 3, 2023.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com