Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Linguistic Profiling of Deepfakes: An Open Database for Next-Generation Deepfake Detection (2401.02335v1)

Published 4 Jan 2024 in cs.CV

Abstract: The emergence of text-to-image generative models has revolutionized the field of deepfakes, enabling the creation of realistic and convincing visual content directly from textual descriptions. However, this advancement presents considerably greater challenges in detecting the authenticity of such content. Existing deepfake detection datasets and methods often fall short in effectively capturing the extensive range of emerging deepfakes and offering satisfactory explanatory information for detection. To address the significant issue, this paper introduces a deepfake database (DFLIP-3K) for the development of convincing and explainable deepfake detection. It encompasses about 300K diverse deepfake samples from approximately 3K generative models, which boasts the largest number of deepfake models in the literature. Moreover, it collects around 190K linguistic footprints of these deepfakes. The two distinguished features enable DFLIP-3K to develop a benchmark that promotes progress in linguistic profiling of deepfakes, which includes three sub-tasks namely deepfake detection, model identification, and prompt prediction. The deepfake model and prompt are two essential components of each deepfake, and thus dissecting them linguistically allows for an invaluable exploration of trustworthy and interpretable evidence in deepfake detection, which we believe is the key for the next-generation deepfake detection. Furthermore, DFLIP-3K is envisioned as an open database that fosters transparency and encourages collaborative efforts to further enhance its growth. Our extensive experiments on the developed benchmark verify that our DFLIP-3K database is capable of serving as a standardized resource for evaluating and comparing linguistic-based deepfake detection, identification, and prompt prediction techniques.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (72)
  1. ASimilarityCalculatior. https://huggingface.co/JosephusCheung/ASimilarityCalculatior. Accessed June 7, 2023.
  2. AIBooru. https://aibooru.online/. Accessed June 7, 2023.
  3. AIGodlike. https://www.aigodlike.com/. Accessed June 7, 2023.
  4. ArtHub.AI. https://arthub.ai/. Accessed June 7, 2023.
  5. DALL·E 2.0 Artistic Visual Gallery. https://dalle2.gallery/. Accessed June 7, 2023.
  6. Finding.Art. https://finding.art/. Accessed June 7, 2023.
  7. Majinai Art. https://majinai.art/. Accessed June 7, 2023.
  8. Midjourney showcase. https://www.midjourney.com/showcase/recent/. Accessed June 7, 2023.
  9. Flamingo: a visual language model for few-shot learning. ArXiv, abs/2204.14198, 2022.
  10. Fastdup. GitHub. Note: https://github.com/visual-layer/fastdup, 2022.
  11. Wasserstein generative adversarial networks. In International conference on machine learning, pp. 214–223. PMLR, 2017.
  12. AUTOMATIC1111. Stable diffusion webui, 2022. https://github.com/AUTOMATIC1111/stable-diffusion-webui.
  13. Openflamingo, March 2023. URL https://doi.org/10.5281/zenodo.7733589.
  14. Large scale gan training for high fidelity natural image synthesis. In International Conference on Learning Representations, 2018.
  15. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  8789–8797, 2018.
  16. CompVis. SD-SAFETY. https://huggingface.co/CompVis/stable-diffusion-safety-checker. Accessed June 7, 2023.
  17. The deepfake detection challenge (dfdc) dataset. arXiv preprint arXiv:2006.07397, 2020.
  18. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  19. An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618, 2022.
  20. Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.
  21. Level up the deepfake detection: a method to effectively discriminate images generated by gan architectures and diffusion models. arXiv preprint arXiv:2303.00608, 2023.
  22. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016.
  23. Transforming auto-encoders. In Artificial Neural Networks and Machine Learning–ICANN 2011: 21st International Conference on Artificial Neural Networks, Espoo, Finland, June 14-17, 2011, Proceedings, Part I 21, pp.  44–51. Springer, 2011.
  24. David Holz. Midjourney: Exploring new mediums of thought and expanding the imaginative powers of the human species. 2022.
  25. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  26. Scaling up gans for text-to-image synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  27. Fairface: Face attribute dataset for balanced race, gender, and age for bias measurement and mitigation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp.  1548–1558, 2021.
  28. Progressive growing of gans for improved quality, stability, and variation. In International Conference on Learning Representations, 2018.
  29. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  4401–4410, 2019.
  30. Kim, Kichang. DeepDanbooru Repository. https://github.com/KichangKim/DeepDanbooru. Accessed June 7, 2023.
  31. Pick-a-pic: An open dataset of user preferences for text-to-image generation. arXiv preprint arXiv:2305.01569, 2023.
  32. Deepfakes: a new threat to face recognition? assessment and detection. arXiv preprint arXiv:1812.08685, 2018.
  33. Mivolo: Multi-input transformer for age and gender estimation. arXiv preprint arXiv:2307.04616, 2023.
  34. Gant Laborde. Deep nn for nsfw detection. URL https://github.com/GantMan/nsfw_model.
  35. LAION. Laion aesthetics v1. Technical Report Version 1.0, LAION AI, 2022.  url https://github.com/LAION-AI/aesthetic-predictor .
  36. LAION-AI. CLIP-based-NSFW-Detector. https://github.com/LAION-AI/CLIP-based-NSFW-Detector, a. Accessed June 7, 2023.
  37. LAION-AI. LAION-SAFETY Repository. https://github.com/LAION-AI/LAION-SAFETY, b. Accessed June 7, 2023.
  38. A continual deepfake detection benchmark: Dataset, methods, and essentials. arXiv preprint arXiv:2205.05467, 2022a.
  39. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In ICML, 2022b.
  40. Celeb-df: A large-scale challenging dataset for deepfake forensics. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  3207–3216, 2020.
  41. Justin Maier. Civitai: The ai art community’s free, open-source model-sharing hub, 2023. URL https://civitai.com/.
  42. Montier, Romain. CLIP-Retrieval. https://rom1504.github.io/clip-retrieval. Accessed June 7, 2023.
  43. Deephy: On deepfake phylogeny. In 2022 IEEE International Joint Conference on Biometrics (IJCB), pp.  1–10. IEEE, 2022.
  44. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021.
  45. Jonas Oppenlaender. Prompt engineering for text-based generative art. arXiv preprint arXiv:2204.13988, 2022.
  46. Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019.
  47. Simulacra aesthetic captions. Technical Report Version 1.0, Stability AI, 2022. https://github.com/JD-P/simulacra-aesthetic-captions.
  48. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pp. 8748–8763. PMLR, 2021.
  49. Zero-shot text-to-image generation. In International Conference on Machine Learning, pp. 8821–8831. PMLR, 2021.
  50. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
  51. Towards the detection of diffusion model deepfakes. arXiv preprint arXiv:2210.14571, 2022.
  52. High-resolution image synthesis with latent diffusion models, 2021.
  53. Faceforensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF international conference on computer vision, pp.  1–11, 2019.
  54. Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487, 2022.
  55. Multi-region probabilistic histograms for robust and scalable identity inference. In Advances in Biometrics: Third International Conference, ICB 2009, Alghero, Italy, June 2-5, 2009. Proceedings 3, pp.  199–208. Springer, 2009.
  56. LAION-5B: An open large-scale dataset for training next generation image-text models. arXiv preprint arXiv:2210.08402, 2022.
  57. Sharif Shameem. Lexica: Building a Creative Tool for the Future, 2022. URL https://lexica.art.
  58. Wasserstein auto-encoders. arXiv preprint arXiv:1711.01558, 2017.
  59. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  60. Midjourney user prompts & generated images (250k), 2023.
  61. Cnn-generated images are surprisingly easy to spot… for now. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  8695–8704, 2020.
  62. S-prompts learning with pre-trained transformers: An occam’s razor for domain incremental learning. In Conference on Neural Information Processing Systems (NeurIPS), 2022a.
  63. Learning to prompt for continual learning. In CVPR, 2022b.
  64. Diffusiondb: A large-scale prompt gallery dataset for text-to-image generative models. arXiv preprint arXiv:2210.14896, 2022c.
  65. Better aligning text-to-image models with human preference. arXiv preprint arXiv:2303.14420, 2023.
  66. Imagereward: Learning and evaluating human preferences for text-to-image generation. arXiv preprint arXiv:2304.05977, 2023.
  67. Scaling autoregressive models for content-rich text-to-image generation. arXiv preprint arXiv:2206.10789, 2022.
  68. Adding conditional control to text-to-image diffusion models. arXiv preprint arXiv:2302.05543, 2023.
  69. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
  70. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, pp.  2223–2232, 2017.
  71. Multimodal C4: An open, billion-scale corpus of images interleaved with text. arXiv preprint arXiv:2304.06939, 2023.
  72. Wilddeepfake: A challenging real-world dataset for deepfake detection. In Proceedings of the 28th ACM international conference on multimedia, pp.  2382–2390, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yabin Wang (14 papers)
  2. Zhiwu Huang (41 papers)
  3. Zhiheng Ma (21 papers)
  4. Xiaopeng Hong (59 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.