Unveiling the Truth: Exploring Human Gaze Patterns in Fake Images (2403.08933v1)
Abstract: Creating high-quality and realistic images is now possible thanks to the impressive advancements in image generation. A description in natural language of your desired output is all you need to obtain breathtaking results. However, as the use of generative models grows, so do concerns about the propagation of malicious content and misinformation. Consequently, the research community is actively working on the development of novel fake detection techniques, primarily focusing on low-level features and possible fingerprints left by generative models during the image generation process. In a different vein, in our work, we leverage human semantic knowledge to investigate the possibility of being included in frameworks of fake image detection. To achieve this, we collect a novel dataset of partially manipulated images using diffusion models and conduct an eye-tracking experiment to record the eye movements of different observers while viewing real and fake stimuli. A preliminary statistical analysis is conducted to explore the distinctive patterns in how humans perceive genuine and altered images. Statistical findings reveal that, when perceiving counterfeit samples, humans tend to focus on more confined regions of the image, in contrast to the more dispersed observational pattern observed when viewing genuine images. Our dataset is publicly available at: https://github.com/aimagelab/unveiling-the-truth.
- J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, “Deep unsupervised learning using nonequilibrium thermodynamics,” in Proceedings of the International Conference on Machine Learning, 2015.
- J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” in Advances in Neural Information Processing Systems, 2020.
- R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022.
- J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” in Proceedings of the International Conference on Learning Representations, 2020.
- P. Dhariwal and A. Nichol, “Diffusion models beat gans on image synthesis,” in Advances in Neural Information Processing Systems, 2021.
- A. Q. Nichol and P. Dhariwal, “Improved denoising diffusion probabilistic models,” in Proceedings of the International Conference on Machine Learning, 2021.
- N. Ruiz, Y. Li, V. Jampani, Y. Pritch, M. Rubinstein, and K. Aberman, “DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2023.
- C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. L. Denton, K. Ghasemipour, R. Gontijo Lopes, B. Karagol Ayan, T. Salimans et al., “Photorealistic text-to-image diffusion models with deep language understanding,” in Advances in Neural Information Processing Systems, 2022.
- A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, and I. Sutskever, “Zero-shot text-to-image generation,” in Proceedings of the International Conference on Machine Learning, 2021.
- T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, “Analyzing and Improving the Image Quality of StyleGAN,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020.
- X. Yi, H. Xu, H. Zhang, L. Tang, and J. Ma, “Diff-Retinex: Rethinking Low-light Image Enhancement with A Generative Diffusion Model,” in Proceedings of the International Conference on Computer Vision, 2023.
- L. Guo, C. Wang, W. Yang, S. Huang, Y. Wang, H. Pfister, and B. Wen, “ShadowDiffusion: When Degradation Prior Meets Diffusion Model for Shadow Removal,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2023.
- U. Ojha, Y. Li, and Y. J. Lee, “Towards universal fake image detectors that generalize across generative models,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2023.
- R. Corvi, D. Cozzolino, G. Poggi, K. Nagano, and L. Verdoliva, “Intriguing properties of synthetic images: from generative adversarial networks to diffusion models,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2023.
- S.-Y. Wang, O. Wang, R. Zhang, A. Owens, and A. A. Efros, “Cnn-generated images are surprisingly easy to spot… for now,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020.
- H. Wu, J. Zhou, and S. Zhang, “Generalizable Synthetic Image Detection via Language-guided Contrastive Learning,” arXiv preprint arXiv:2305.13800, 2023.
- Y. Jeong, D. Kim, Y. Ro, and J. Choi, “Frepgan: robust deepfake detection using frequency-level perturbations,” in Proceedings of the Conference on Artificial Intelligence, 2022.
- A. Rossler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nießner, “FaceForensics++: Learning to Detect Manipulated Facial Images,” in Proceedings of the International Conference on Computer Vision, 2019.
- H. Zhao, W. Zhou, D. Chen, T. Wei, W. Zhang, and N. Yu, “Multi-attentional deepfake detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021.
- Y. Li and S. Lyu, “Exposing deepfake videos by detecting face warping artifacts,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018.
- R. Amoroso, D. Morelli, M. Cornia, L. Baraldi, A. Del Bimbo, and R. Cucchiara, “Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images,” arXiv preprint arXiv:2304.00500, 2023.
- P. Lorenz, R. L. Durall, and J. Keuper, “Detecting Images Generated by Deep Diffusion Models using their Local Intrinsic Dimensionality,” in Proceedings of the International Conference on Computer Vision Workshops, 2023.
- J. Frank, T. Eisenhofer, L. Schönherr, A. Fischer, D. Kolossa, and T. Holz, “Leveraging frequency analysis for deep fake image recognition,” in Proceedings of the International Conference on Machine Learning, 2020.
- R. Corvi, D. Cozzolino, G. Zingarini, G. Poggi, K. Nagano, and L. Verdoliva, “On the detection of synthetic images generated by diffusion models,” in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, 2023.
- T. Foulsham and A. Kingstone, “Asymmetries in the direction of saccades during perception of scenes and fractals: Effects of image type and image features,” Vision Research, vol. 50, no. 8, pp. 779–795, 2010.
- D. Parkhurst, K. Law, and E. Niebur, “Modeling the role of salience in the allocation of overt visual attention,” Vision Research, vol. 42, no. 1, pp. 107–123, 2002.
- R. J. Peters, A. Iyer, L. Itti, and C. Koch, “Components of bottom-up gaze allocation in natural images,” Vision Research, vol. 45, no. 18, pp. 2397–2416, 2005.
- T. Brooks, A. Holynski, and A. A. Efros, “InstructPix2Pix: Learning To Follow Image Editing Instructions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2023.
- D. Marr, “Analyzing natural images: A computational theory of texture vision,” in Cold Spring Harbor Symposia on Quantitative Biology, vol. 40. Cold Spring Harbor Laboratory Press, 1976, pp. 647–662.
- A. Oliva and A. Torralba, “Building the gist of a scene: The role of global image features in recognition,” Progress in Brain Eesearch, vol. 155, pp. 23–36, 2006.
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common Objects in Context,” in Proceedings of the European Conference on Computer Vision, 2014.
- B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso, and A. Torralba, “Scene parsing through ade20k dataset,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
- I. Skorokhodov, G. Sotnikov, and M. Elhoseiny, “Aligning latent and image spaces to connect the unconnectable,” in Proceedings of the International Conference on Computer Vision, 2021.
- M. Cerf, J. Harel, W. Einhäuser, and C. Koch, “Predicting human gaze using low-level saliency combined with face detection,” in Advances in Neural Information Processing Systems, 2007.
- T. Judd, K. Ehinger, F. Durand, and A. Torralba, “Learning to predict where humans look,” in Proceedings of the International Conference on Computer Vision, 2009.
- M. Cerf, E. P. Frady, and C. Koch, “Faces and text attract gaze independent of the task: Experimental data and computer model,” Journal of Vision, vol. 9, no. 12, pp. 10–10, 2009.
- Z. Yang, L. Huang, Y. Chen, Z. Wei, S. Ahn, G. Zelinsky, D. Samaras, and M. Hoai, “Predicting goal-directed human attention using inverse reinforcement learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020.
- H. Aboutalebi, D. Mao, C. Xu, and A. Wong, “DeepfakeArt Challenge: A Benchmark Dataset for Generative AI Art Forgery and Data Poisoning Detection,” arXiv preprint arXiv:2306.01272, 2023.
- A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo et al., “Segment Anything,” in Proceedings of the International Conference on Computer Vision, 2023.
- Y. Zhang, X. Huang, J. Ma, Z. Li, Z. Luo, Y. Xie, Y. Qin, T. Luo, Y. Li, S. Liu et al., “Recognize Anything: A Strong Image Tagging Model,” arXiv preprint arXiv:2306.03514, 2023.
- S. Zhang, X. Yang, Y. Feng, C. Qin, C.-C. Chen, N. Yu, Z. Chen, H. Wang, S. Savarese, S. Ermon et al., “HIVE: Harnessing Human Feedback for Instructional Visual Editing,” arXiv preprint arXiv:2303.09618, 2023.
- K. Zhang, L. Mo, W. Chen, H. Sun, and Y. Su, “MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing,” in Advances in Neural Information Processing Systems, 2023.
- D. D. Salvucci and J. H. Goldberg, “Identifying fixations and saccades in eye-tracking protocols,” in Proceedings of the Symposium on Eye Tracking Research & Applications, 2000.
- Giuseppe Cartella (6 papers)
- Vittorio Cuculo (4 papers)
- Marcella Cornia (61 papers)
- Rita Cucchiara (142 papers)