Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Leveraging Representations from Intermediate Encoder-blocks for Synthetic Image Detection (2402.19091v2)

Published 29 Feb 2024 in cs.CV

Abstract: The recently developed and publicly available synthetic image generation methods and services make it possible to create extremely realistic imagery on demand, raising great risks for the integrity and safety of online information. State-of-the-art Synthetic Image Detection (SID) research has led to strong evidence on the advantages of feature extraction from foundation models. However, such extracted features mostly encapsulate high-level visual semantics instead of fine-grained details, which are more important for the SID task. On the contrary, shallow layers encode low-level visual information. In this work, we leverage the image representations extracted by intermediate Transformer blocks of CLIP's image-encoder via a lightweight network that maps them to a learnable forgery-aware vector space capable of generalizing exceptionally well. We also employ a trainable module to incorporate the importance of each Transformer block to the final prediction. Our method is compared against the state-of-the-art by evaluating it on 20 test datasets and exhibits an average +10.6% absolute performance improvement. Notably, the best performing models require just a single epoch for training (~8 minutes). Code available at https://github.com/mever-team/rine.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Stamatis Karnouskos. Artificial intelligence in digital media: The era of deepfakes. IEEE Transactions on Technology and Society, 1(3):138–147, 2020.
  2. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
  3. A survey on gans for computer vision: Recent research, analysis and taxonomy. Computer Science Review, 48:100553, 2023.
  4. Diffusion models in vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  5. Seeing is not always believing: Benchmarking human and model perception of ai-generated images. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023.
  6. Deepfakes and beyond: A survey of face manipulation and fake detection. Information Fusion, 64:131–148, 2020.
  7. The creation and detection of deepfakes: A survey. ACM Computing Surveys (CSUR), 54(1):1–41, 2021.
  8. Forensictransfer: Weakly-supervised domain adaptation for forgery detection. arXiv preprint arXiv:1812.02510, 2018.
  9. Cnn-generated images are surprisingly easy to spot… for now. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8695–8704, 2020.
  10. What makes fake images detectable? understanding properties that generalize. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVI 16, pages 103–120. Springer, 2020.
  11. Leveraging frequency analysis for deep fake image recognition. In International conference on machine learning, pages 3247–3258. PMLR, 2020.
  12. Thinking in frequency: Face forgery detection by mining frequency-aware clues. In European conference on computer vision, pages 86–103. Springer, 2020.
  13. Bihpf: Bilateral high-pass filters for robust deepfake detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 48–57, 2022.
  14. Intriguing properties of synthetic images: from generative adversarial networks to diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 973–982, 2023.
  15. On the detection of synthetic images generated by diffusion models. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2023.
  16. Towards universal fake image detectors that generalize across generative models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24480–24489, 2023.
  17. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  18. Constrained convolutional neural networks: A new approach towards general purpose image manipulation detection. IEEE Transactions on Information Forensics and Security, 13(11):2691–2706, 2018.
  19. Digital camera identification from sensor pattern noise. IEEE Transactions on Information Forensics and Security, 1(2):205–214, 2006.
  20. Do gans leave artificial fingerprints? In 2019 IEEE conference on multimedia information processing and retrieval (MIPR), pages 506–511. IEEE, 2019.
  21. Attributing fake images to gans: Learning and analyzing gan fingerprints. In Proceedings of the IEEE/CVF international conference on computer vision, pages 7556–7566, 2019.
  22. Detection of gan-generated fake images over social networks. In 2018 IEEE conference on multimedia information processing and retrieval (MIPR), pages 384–389. IEEE, 2018.
  23. Faceforensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1–11, 2019.
  24. Detecting gan generated fake images using co-occurrence matrices. arXiv preprint arXiv:1903.06836, 2019.
  25. Progressive growing of gans for improved quality, stability, and variation. In International Conference on Learning Representations, 2018.
  26. Detecting and simulating artifacts in gan fake images. In 2019 IEEE international workshop on information forensics and security (WIFS), pages 1–6. IEEE, 2019.
  27. Frepgan: robust deepfake detection using frequency-level perturbations. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 1060–1068, 2022.
  28. Learning on gradients: Generalized artifacts representation for gan-generated images detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12105–12114, 2023.
  29. Improving synthetically generated image detection in cross-concept settings. In Proceedings of the 2nd ACM International Workshop on Multimedia AI against Disinformation, pages 28–35, 2023.
  30. Deep layer aggregation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2403–2412, 2018.
  31. Aggregating layers for deepfake detection. In 2022 26th International Conference on Pattern Recognition (ICPR), pages 2027–2033. IEEE, 2022.
  32. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  33. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  34. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
  35. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415, 2016.
  36. Abien Fred Agarap. Deep learning using rectified linear units (relu). arXiv preprint arXiv:1803.08375, 2018.
  37. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1):1929–1958, 2014.
  38. Irving John Good. Rational decisions. Journal of the Royal Statistical Society: Series B (Methodological), 14(1):107–114, 1952.
  39. Supervised contrastive learning. Advances in neural information processing systems, 33:18661–18673, 2020.
  40. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019.
  41. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8110–8119, 2020.
  42. Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, 2018.
  43. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, pages 2223–2232, 2017.
  44. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8789–8797, 2018.
  45. Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2337–2346, 2019.
  46. Learning to see in the dark. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3291–3300, 2018.
  47. Second-order attention network for single image super-resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11065–11074, 2019.
  48. Photographic image synthesis with cascaded refinement networks. In Proceedings of the IEEE international conference on computer vision, pages 1511–1520, 2017.
  49. Diverse image synthesis from semantic layouts via conditional imle. 2019 ieee. In CVF International Conference on Computer Vision (ICCV), pages 4219–4228, 2019.
  50. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
  51. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
  52. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021.
  53. Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR, 2021.
  54. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  55. François Chollet. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1251–1258, 2017.
  56. Quentin Bammey. Synthbuster: Towards detection of diffusion model generated images. IEEE Open Journal of Signal Processing, 2023.
Citations (12)

Summary

We haven't generated a summary for this paper yet.