Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Mixture of Low-rank Experts for Transferable AI-Generated Image Detection (2404.04883v1)

Published 7 Apr 2024 in cs.CV

Abstract: Generative models have shown a giant leap in synthesizing photo-realistic images with minimal expertise, sparking concerns about the authenticity of online information. This study aims to develop a universal AI-generated image detector capable of identifying images from diverse sources. Existing methods struggle to generalize across unseen generative models when provided with limited sample sources. Inspired by the zero-shot transferability of pre-trained vision-LLMs, we seek to harness the nontrivial visual-world knowledge and descriptive proficiency of CLIP-ViT to generalize over unknown domains. This paper presents a novel parameter-efficient fine-tuning approach, mixture of low-rank experts, to fully exploit CLIP-ViT's potential while preserving knowledge and expanding capacity for transferable detection. We adapt only the MLP layers of deeper ViT blocks via an integration of shared and separate LoRAs within an MoE-based structure. Extensive experiments on public benchmarks show that our method achieves superiority over state-of-the-art approaches in cross-generator generalization and robustness to perturbations. Remarkably, our best-performing ViT-L/14 variant requires training only 0.08% of its parameters to surpass the leading baseline by +3.64% mAP and +12.72% avg.Acc across unseen diffusion and autoregressive models. This even outperforms the baseline with just 0.28% of the training data. Our code and pre-trained models will be available at https://github.com/zhliuworks/CLIPMoLE.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (68)
  1. Midjourney. https://www.midjourney.com.
  2. Wukong. https://xihe.mindspore.cn/modelzoo/wukong.
  3. Photo forensics from JPEG dimples. In IEEE Workshop on Information Forensics and Security (WIFS), pages 1–6. IEEE, 2017.
  4. Large scale GAN training for high fidelity natural image synthesis. In International Conference on Learning Representations (ICLR). OpenReview, 2019.
  5. What makes fake images detectable? understanding properties that generalize. In European Conference on Computer Vision (ECCV), pages 103–120. Springer, 2020.
  6. A closer look at fourier spectrum discrepancies for cnn-generated images detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7200–7209. IEEE, 2021.
  7. Learning to see in the dark. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3291–3300. IEEE, 2018.
  8. Generative pretraining from pixels. In Proceedings of the International Conference on Machine Learning (ICML), pages 1691–1703. PMLR, 2020.
  9. Photographic image synthesis with cascaded refinement networks. In IEEE International Conference on Computer Vision (ICCV), pages 1520–1529. IEEE, 2017.
  10. Sparse moe as the new dropout: Scaling dense and self-slimmable transformers. In International Conference on Learning Representations (ICLR). OpenReview.net, 2023.
  11. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8789–8797. IEEE, 2018.
  12. François Chollet. Xception: Deep learning with depthwise separable convolutions. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1800–1807. IEEE, 2017.
  13. Intriguing properties of synthetic images: from generative adversarial networks to diffusion models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 973–982. IEEE, 2023.
  14. Second-order attention network for single image super-resolution. In IEEE/CVR Conference on Computer Vision and Pattern Recognition (CVPR), pages 11065–11074. IEEE, 2019.
  15. Imagenet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 248–255. IEEE Computer Society, 2009.
  16. Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 9(11), 2008.
  17. Diffusion models beat gans on image synthesis. In Annual Conference on Neural Information Processing Systems (NeurIPS), pages 8780–8794, 2021.
  18. Density estimation using real NVP. In International Conference on Learning Representations (ICLR). OpenReview, 2017.
  19. Think twice before detecting gan-generated fake images from their spectral domain imprints. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7855–7864. IEEE, 2022.
  20. Watch your up-convolution: CNN based generative deep neural networks are failing to reproduce spectral distributions. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7887–7896. IEEE, 2020.
  21. Fourier spectrum discrepancies in deep network generated images. In Annual Conference on Neural Information Processing Systems (NeurIPS), 2020.
  22. Online detection of ai-generated images. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pages 382–392, 2023.
  23. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. Journal of Machine Learning Research, 23:120:1–120:39, 2022.
  24. Leveraging frequency analysis for deep fake image recognition. In Proceedings of the International Conference on Machine Learning (ICML), pages 3247–3258. PMLR, 2020.
  25. Transformer feed-forward layers are key-value memories. In Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5484–5495. Association for Computational Linguistics, 2021.
  26. Generative adversarial nets. In Annual Conference on Neural Information Processing Systems (NeurIPS), pages 2672–2680, 2014.
  27. Vector quantized diffusion model for text-to-image synthesis. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10686–10696. IEEE, 2022.
  28. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778. IEEE, 2016.
  29. Classifier-free diffusion guidance. CoRR, abs/2207.12598, 2022.
  30. Denoising diffusion probabilistic models. In Annual Conference on Neural Information Processing Systems (NeurIPS), 2020.
  31. Lora: Low-rank adaptation of large language models. In International Conference on Learning Representations (ICLR). OpenReview, 2022.
  32. Fighting fake news: Image splice detection via learned self-consistency. In European Conference on Computer Vision (ECCV), pages 106–124. Springer, 2018.
  33. Progressive growing of gans for improved quality, stability, and variation. In International Conference on Learning Representations (ICLR). OpenReview, 2018.
  34. A style-based generator architecture for generative adversarial networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4401–4410. IEEE, 2019.
  35. Exposing photo manipulation with inconsistent shadows. ACM Transactions on Graphics, 32(3):28:1–28:12, 2013.
  36. Auto-encoding variational bayes. In International Conference on Learning Representations (ICLR), 2014.
  37. Fine-tuning can distort pretrained features and underperform out-of-distribution. In International Conference on Learning Representations (ICLR). OpenReview.net, 2022.
  38. Gshard: Scaling giant models with conditional computation and automatic sharding. In International Conference on Learning Representations (ICLR). OpenReview.net, 2021.
  39. Diverse image synthesis from semantic layouts via conditional IMLE. In IEEE/CVF International Conference on Computer Vision (ICCV), pages 4219–4228. IEEE, 2019.
  40. Detecting generated images by real images. In European Conference on Computer Vision (ECCV), pages 95–110. Springer, 2022a.
  41. Pseudo numerical methods for diffusion models on manifolds. In International Conference on Learning Representations (ICLR). OpenReview, 2022b.
  42. Global texture enhancement for fake face detection in the wild. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8057–8066. IEEE, 2020.
  43. Decoupled weight decay regularization. In International Conference on Learning Representations (ICLR). OpenReview.net, 2019.
  44. Forgery-aware adaptive vision transformer for face forgery detection. CoRR, abs/2309.11092, 2023.
  45. Do gans leave artificial fingerprints? In IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), pages 506–511. IEEE, 2019.
  46. Detecting GAN generated fake images using co-occurrence matrices. In Media Watermarking, Security, and Forensics, 2019.
  47. GLIDE: towards photorealistic image generation and editing with text-guided diffusion models. In Proceedings of the International Conference on Machine Learning (ICML), pages 16784–16804. PMLR, 2022.
  48. Towards universal fake image detectors that generalize across generative models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 24480–24489. IEEE, 2023.
  49. Semantic image synthesis with spatially-adaptive normalization. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2337–2346. IEEE, 2019.
  50. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning (ICML), pages 8748–8763. PMLR, 2021.
  51. Zero-shot text-to-image generation. In Proceedings of the International Conference on Machine Learning (ICML), pages 8821–8831. PMLR, 2021.
  52. Hierarchical text-conditional image generation with CLIP latents. CoRR, abs/2204.06125, 2022.
  53. High-resolution image synthesis with latent diffusion models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10674–10685. IEEE, 2022.
  54. Faceforensics++: Learning to detect manipulated facial images. In IEEE/CVF International Conference on Computer Vision (ICCV), pages 1–11. IEEE, 2019.
  55. Photorealistic text-to-image diffusion models with deep language understanding. In Annual Conference on Neural Information Processing Systems (NeurIPS), 2022.
  56. DE-FAKE: detection and attribution of fake images generated by text-to-image diffusion models. CoRR, abs/2210.06998, 2022.
  57. Deepfake-adapter: Dual-level adapter for deepfake detection. CoRR, abs/2306.00863, 2023.
  58. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. In International Conference on Learning Representations (ICLR). OpenReview.net, 2017.
  59. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1):1929–1958, 2014.
  60. Learning on gradients: Generalized artifacts representation for gan-generated images detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12105–12114. IEEE, 2023.
  61. Detecting photoshopped faces by scripting photoshop. In IEEE/CVF International Conference on Computer Vision (ICCV), pages 10071–10080. IEEE, 2019.
  62. Cnn-generated images are surprisingly easy to spot… for now. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8692–8701. IEEE, 2020.
  63. DIRE for diffusion-generated image detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 22445–22455, 2023.
  64. LSUN: construction of a large-scale image dataset using deep learning with humans in the loop. CoRR, abs/1506.03365, 2015.
  65. Attributing fake images to gans: Learning and analyzing GAN fingerprints. In IEEE/CVF International Conference on Computer Vision (ICCV), pages 7555–7565. IEEE, 2019.
  66. Detecting and simulating artifacts in GAN fake images. In IEEE International Workshop on Information Forensics and Security (WIFS), pages 1–6. IEEE, 2019.
  67. Unpaired image-to-image translation using cycle-consistent adversarial networks. In IEEE/CVF International Conference on Computer Vision (ICCV), pages 2242–2251, 2017.
  68. Genimage: A million-scale benchmark for detecting ai-generated image. In Annual Conference on Neural Information Processing Systems (NeurIPS), 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Zihan Liu (102 papers)
  2. Hanyi Wang (14 papers)
  3. Yaoyu Kang (1 paper)
  4. Shilin Wang (14 papers)
Citations (6)