Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Seeing is not always believing: Benchmarking Human and Model Perception of AI-Generated Images (2304.13023v3)

Published 25 Apr 2023 in cs.AI and cs.CV

Abstract: Photos serve as a way for humans to record what they experience in their daily lives, and they are often regarded as trustworthy sources of information. However, there is a growing concern that the advancement of AI technology may produce fake photos, which can create confusion and diminish trust in photographs. This study aims to comprehensively evaluate agents for distinguishing state-of-the-art AI-generated visual content. Our study benchmarks both human capability and cutting-edge fake image detection AI algorithms, using a newly collected large-scale fake image dataset Fake2M. In our human perception evaluation, titled HPBench, we discovered that humans struggle significantly to distinguish real photos from AI-generated ones, with a misclassification rate of 38.7%. Along with this, we conduct the model capability of AI-Generated images detection evaluation MPBench and the top-performing model from MPBench achieves a 13% failure rate under the same setting used in the human evaluation. We hope that our study can raise awareness of the potential risks of AI-generated images and facilitate further research to prevent the spread of false information. More information can refer to https://github.com/Inf-imagine/Sentry.

Benchmarking Human and AI Perception of Synthetic Images

The paper "Seeing is not always believing: Benchmarking Human and Model Perception of AI-Generated Images" presents a comprehensive evaluation of human and machine abilities to discern AI-generated images from real photographs. This paper responds to escalating concerns regarding the fidelity of AI-generated imagery and its potential implications for society.

The authors introduce two benchmarks: HPBench and MPBench. HPBench evaluates human perception, revealing that humans frequently struggle to differentiate AI-generated images from authentic ones, achieving an average accuracy of 61.3%, which equates to a misclassification rate of 38.7%. This difficulty underscores the increasing sophistication of AI image synthesis methods, which have begun to erode the reliability of images as truth-bearing records.

Concurrently, MPBench assesses the performance of current AI algorithms designed to detect synthetic images. The AI models tested demonstrate superior performance compared to humans, with the most capable AI achieving a misclassification rate of 13% under comparable settings. These findings illuminate the potential for AI-driven solutions to surpass human abilities in the detection of synthetic media.

A significant contribution of this paper is the introduction of the Fake2M dataset, a large-scale collection of over two million AI-generated images, which serves to train and evaluate these detection algorithms. The dataset encompasses outputs from state-of-the-art models including Stable Diffusion and StyleGAN, presenting a diverse challenge reflective of contemporary synthesis capabilities.

The paper's implications are twofold. Practically, the findings advise caution in relying on images as sources of factual information, as AI-generated content can convincingly mimic reality. Theoretically, they push forward the conversation on the limits of AI and human perception, emphasizing the necessity for more robust detection systems capable of coping with the rapid advancements in AI.

Looking ahead, the paper opens several avenues for future research. There is a need to develop AI detection models that maintain high performance across varied datasets, adjusting for differences in style or synthesis method. Moreover, this research touches on the societal impacts of synthetic imagery, such as misinformation and erosion of trust, suggesting a broader exploration of ethical guidelines and detection methodologies is warranted to safeguard against malicious use.

In summary, while AI capabilities in image generation continue to advance, they present new challenges that require equally sophisticated detection methods. The paper sets a foundation for future exploration and response to these increasingly blurred lines between reality and synthesis in digital imagery.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (85)
  1. 500px. https://500px.com/. Accessed: 2023-04-17.
  2. bbc news: "art is dead dude" - the rise of the ai artists stirs debate. https://www.bbc.com/news/technology-62788725. Accessed: 2023-04-18.
  3. bbc news: Fake trump arrest photos: How to spot an ai-generated image. https://www.bbc.com/news/world-us-canada-65069316. Accessed: 2023-04-18.
  4. bbc news: Sony world photography award 2023: Winner refuses award after revealing ai creation. https://www.bbc.com/news/entertainment-arts-65296763. Accessed: 2023-04-17.
  5. Deepfloyd. if. https://github.com/deep-floyd/IF. Accessed: 2023-06-7.
  6. Google images. https://images.google.com/. Accessed: 2023-04-17.
  7. Midjourney. https://www.midjourney.com/. Accessed: 2023-04-17.
  8. https://photutorial.com/. Accessed: 2023-04-18.
  9. Rembrandt’s the night watch painting restored by ai. https://www.bbc.com/news/technology-57588270. Accessed: 2023-04-18.
  10. Stable diffusion v1.5. realistic vision v2.0. https://civitai.com/models/4201/realistic-vision-v20. Accessed: 2023-06-7.
  11. Proactive image manipulation detection. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
  12. ediff-i: Text-to-image diffusion models with an ensemble of expert denoisers. arXiv preprint arXiv:2211.01324, 2022.
  13. Cifake: Image classification and explainable identification of ai-generated synthetic images. arXiv preprint arXiv:2303.14126, 2023.
  14. Large scale GAN training for high fidelity natural image synthesis. International Conference on Learning Representations, 2019.
  15. Language models are few-shot learners. Advances in Neural Information Processing Systems, 2020.
  16. What makes fake images detectable? understanding properties that generalize. European Conference on Computer Vision, 2020.
  17. Reproducible scaling laws for contrastive language-image learning. arXiv preprint arXiv:2212.07143, 2022.
  18. Deep fakes: A looming challenge for privacy, democracy, and national security. California Law Review, 2019.
  19. Stargan v2: Diverse image synthesis for multiple domains. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
  20. Splicebuster: A new blind image splicing detector. IEEE International Workshop on Information Forensics and Security, 2015.
  21. On the detection of digital face manipulation. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
  22. Imagenet: A large-scale hierarchical image database. 2009.
  23. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems.
  24. Cogview: Mastering text-to-image generation via transformers. Advances in Neural Information Processing Systems, 2021.
  25. Structure and content-guided video synthesis with diffusion models. arXiv preprint arXiv:2302.03011, 2023.
  26. Leveraging frequency analysis for deep fake image recognition. International Conference on Machine Learning, 2020.
  27. Make-a-scene: Scene-based text-to-image generation with human priors. European Conference on Computer Vision, 2022.
  28. An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618, 2022.
  29. Adversarial perturbations fool deepfake detectors. International Joint Conference on Neural Networks, 2020.
  30. Partial success in closing the gap between human and machine vision. Advances in Neural Information Processing Systems, 2021.
  31. Generative adversarial networks. arXiv preprint arXiv:1406.2661, 2014.
  32. Development of photo forensics algorithm by detecting photoshop manipulation using error level analysis. Indonesian Journal of Electrical Engineering and Computer Science, 2017.
  33. Deep residual learning for image recognition. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016.
  34. Forgerynet: A versatile benchmark for comprehensive forgery analysis. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021.
  35. Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022.
  36. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 2020.
  37. Survey on deep learning with class imbalance. Journal of Big Data, 2019.
  38. Scaling up gans for text-to-image synthesis. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
  39. Progressive growing of gans for improved quality, stability, and variation. International Conference on Learning Representations, 2018.
  40. Elucidating the design space of diffusion-based generative models. Advances in Neural Information Processing Systems, 2022.
  41. Alias-free generative adversarial networks. Advances in Neural Information Processing Systems, 2021.
  42. A style-based generator architecture for generative adversarial networks. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.
  43. Analyzing and improving the image quality of stylegan. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
  44. Maskgan: Towards diverse and interactive facial image manipulation. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
  45. Swin transformer: Hierarchical vision transformer using shifted windows. IEEE/CVF International Conference on Computer Vision, 2021.
  46. Global texture enhancement for fake face detection in the wild. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
  47. A convnet for the 2020s. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
  48. Dpm-solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems, 2022.
  49. Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models. arXiv preprint arXiv:2211.01095, 2022.
  50. Hierarchical diffusion autoencoders and disentangled image manipulation. arXiv preprint arXiv:2304.11829, 2023.
  51. Communication in human–ai co-creation: Perceptual analysis of paintings generated by text-to-image system. Applied Science, 2022.
  52. Detection of gan-generated fake images over social networks. IEEE Multimedia Information Processing and Retrieval, 2018.
  53. A survey on bias and fairness in machine learning. ACM Computing Surveys, 2022.
  54. The creation and detection of deepfakes: A survey. ACM Computing Surveys, 2022.
  55. Detecting GAN generated fake images using co-occurrence matrices. Media Watermarking, Security, and Forensics, 2019.
  56. Improved denoising diffusion probabilistic models. International Conference on Machine Learning, 2021.
  57. Towards universal fake image detectors that generalize across generative models. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
  58. OpenAI. GPT-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  59. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
  60. Thinking in frequency: Face forgery detection by mining frequency-aware clues. European Conference on Computer Vision, 2020.
  61. Learning transferable visual models from natural language supervision. International Conference on Machine Learning, 2021.
  62. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 2020.
  63. Hierarchical text-conditional image generation with CLIP latents. Advances in Neural Information Processing Systems, 2022.
  64. High-resolution image synthesis with latent diffusion models. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
  65. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. arXiv preprint arXiv:2208.12242, 2022.
  66. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 2022.
  67. LAION-5B: an open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems, 2022.
  68. Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning. Association for Computational Linguistics, 2018.
  69. Small is beautiful: In defense of the small-n design. Psychonomic bulletin & review, 2018.
  70. Denoising diffusion implicit models. International Conference on Learning Representations, 2020.
  71. Maximum likelihood training of score-based diffusion models. Advances in Neural Information Processing Systems, 2021.
  72. Improved techniques for training score-based generative models. Advances in Neural Information Processing Systems, 2020.
  73. Score-based generative modeling through stochastic differential equations. International Conference on Learning Representations, 2021.
  74. Training data-efficient image transformers & distillation through attention. International Conference on Machine Learning, 2021.
  75. Attention is all you need. Advances in Neural Information Processing Systems, 2017.
  76. 2022 ieee image and video processing cup synthetic image detection. 2022.
  77. Bringing old photos back to life. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
  78. Fakespotter: A simple yet robust baseline for spotting ai-synthesized fake faces. International Joint Conference on Artificial Intelligence, 2020.
  79. Cnn-generated images are surprisingly easy to spot… for now. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
  80. Benchmarking deepart detection. arXiv preprint arXiv:2302.14475, 2023.
  81. Scaling autoregressive models for content-rich text-to-image generation. Advances in Neural Information Processing Systems, 2022.
  82. Cross-modal contrastive learning for text-to-image generation. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021.
  83. Adding conditional control to text-to-image diffusion models. arXiv preprint arXiv:2302.05543, 2023.
  84. Detecting and simulating artifacts in GAN fake images. IEEE International Workshop on Information Forensics and Security, 2019.
  85. Deep long-tailed learning: A survey. arXiv preprint arXiv:2110.04596, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Zeyu Lu (16 papers)
  2. Di Huang (203 papers)
  3. Lei Bai (154 papers)
  4. Jingjing Qu (4 papers)
  5. Chengyue Wu (22 papers)
  6. Xihui Liu (92 papers)
  7. Wanli Ouyang (358 papers)
Citations (37)
Github Logo Streamline Icon: https://streamlinehq.com