Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Identifying and Mitigating the Security Risks of Generative AI (2308.14840v4)

Published 28 Aug 2023 in cs.AI

Abstract: Every major technical invention resurfaces the dual-use dilemma -- the new technology has the potential to be used for good as well as for harm. Generative AI (GenAI) techniques, such as LLMs and diffusion models, have shown remarkable capabilities (e.g., in-context learning, code-completion, and text-to-image generation and editing). However, GenAI can be used just as well by attackers to generate new attacks and increase the velocity and efficacy of existing attacks. This paper reports the findings of a workshop held at Google (co-organized by Stanford University and the University of Wisconsin-Madison) on the dual-use dilemma posed by GenAI. This paper is not meant to be comprehensive, but is rather an attempt to synthesize some of the interesting findings from the workshop. We discuss short-term and long-term goals for the community on this topic. We hope this paper provides both a launching point for a discussion on this important topic as well as interesting problems that the research community can work to address.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (107)
  1. * * *. Your AI model might be telling you this is not a cat. https://art-demo.mybluemix.net/.
  2. * * *. Authors Guild letter seeks compensation from AI companies for using authors’ writings in AI. https://chatgptiseatingtheworld.com/2023/07/19/authors-guild-letter-seeks-compensation-from-ai-companies-for-using-authors-writings-in-ai/, 2023.
  3. * * *. ChaosGPT: Empowering GPT with internet and memory to destroy humanity. https://www.youtube.com/watch?v=g7YJIpkk7KM, 2023.
  4. * * *. Securing the future of GenAI: Mitigating security risks. https://sites.google.com/view/genai-risk-workshop, 2023.
  5. Mark S. Ackerman. The intellectual challenge of CSCW: The gap between social requirements and technical feasibility. Human-Computer Interaction, 15(2):179–203, 2000.
  6. Generating sentiment-preserving fake online reviews using neural language models and their human-and machine-based detection. In Advanced Information Networking and Applications: Proceedings of the 34th International Conference on Advanced Information Networking and Applications (AINA-2020), pages 1341–1354. Springer, 2020.
  7. Roberto Gozalo-Brizuela Alejo José G. Sison, Marco Tulio Daza and Eduardo C. Garrido-Merchán. Chatgpt: More than a “weapon of mass deception” ethical challenges and responses from the human-centered artificial intelligence (hcai) perspective. International Journal of Human–Computer Interaction, 0(0):1–20, 2023.
  8. Detecting language model attacks with perplexity, 2023.
  9. Natural language watermarking: Design, analysis, and a proof-of-concept implementation. In Information Hiding: 4th International Workshop, IH 2001 Pittsburgh, PA, USA, April 25–27, 2001 Proceedings 4, pages 185–200. Springer, 2001.
  10. Constitutional AI: Harmlessness from AI feedback. https://arxiv.org/abs/2212.08073, 2022.
  11. Real or fake? learning to discriminate machine from human generated text. https://arxiv.org/abs/1906.03351, 2019.
  12. Ashley Belanger. OpenAI, Google will watermark AI-generated content to hinder deepfakes, misinfo. https://arstechnica.com/ai/2023/07/openai-google-will-watermark-ai-generated-content-to-hinder-deepfakes-misinfo/, 2023.
  13. Molly Bohannon. Lawyer used ChatGPT in court—and cited fake cases. a judge is considering sanctions. https://www.forbes.com/sites/mollybohannon/2023/06/08/lawyer-used-chatgpt-in-court-and-cited-fake-cases-a-judge-is-considering-sanctions/?sh=1175e6e87c7f, 2023.
  14. ChemCrow: Augmenting large-language models with chemistry tools. https://arxiv.org/abs/2304.05376, 2023.
  15. Blake Brittain. Lawsuit says openai violated us authors’ copyrights to train ai chatbot. https://www.reuters.com/legal/lawsuit-says-openai-violated-us-authors-copyrights-train-ai-chatbot-2023-06-29/, 2023.
  16. Miriam Buiten. Product liability for defective AI. https://ssrn.com/abstract=4515202, 2023.
  17. Poisoning web-scale training datasets is practical. https://arxiv.org/abs/2302.10149, 2023.
  18. Are aligned neural networks adversarially aligned? https://arxiv.org/abs/2306.15447, 2023.
  19. Poisoning and backdooring contrastive learning. https://arxiv.org/abs/2106.09667, 2022.
  20. Extracting training data from large language models. In USENIX Security Symposium, 2020.
  21. Undetectable watermarks for language models. Cryptology ePrint Archive, Paper 2023/763, https://eprint.iacr.org/2023/763, 2023.
  22. DARPA Public Affairs. DARPA announces research teams selected to semantic forensics program. https://www.darpa.mil/news-events/2021-03-02, 2021.
  23. Unmasking deepfakes with simple features. https://arxiv.org/abs/1911.00686, 2019.
  24. TweepFake: About detecting deepfake tweets. https://arxiv.org/abs/2008.00036, 2020.
  25. The stable signature: Rooting watermarks in latent diffusion models. https://arxiv.org/abs/2303.15435, 2023.
  26. Leveraging frequency analysis for deep fake image recognition. In International Conference on Machine Learning, pages 3247–3258. PMLR, 2020.
  27. Iason Gabriel. Artificial intelligence, values, and alignment. Minds and Machines, 30(3):411–437, 2020.
  28. How the eu can take on the challenge posed by general-purposeai systems. https://assets.mofoprod.net/network/documents/AI-Act_Mozilla-GPAI-Brief_Kx1ktuk.pdf, 2022.
  29. Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned. https://arxiv.org/abs/2209.07858, 2022.
  30. RARR: Researching and revising what language models say, using language models. https://arxiv.org/abs/2210.08726, 2023.
  31. RealToxicityPrompts: Evaluating neural toxic degeneration in language models. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3356–3369, Online, 2020.
  32. GLTR: Statistical detection and visualization of generated text. https://arxiv.org/abs/1906.04043, 2019.
  33. LLM censorship: A machine learning challenge or a computer security problem? https://arxiv.org/abs/2307.10719, 2023.
  34. Google. Bard. https://bard.google.com/, 2023.
  35. Deepfake detection by analyzing convolutional traces. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 666–667, 2020.
  36. Large language models for code: Security hardening and adversarial testing. https://arxiv.org/abs/2302.05319, 2023.
  37. TRUE: Re-evaluating factual consistency evaluation. https://arxiv.org/abs/2204.04991, 2022.
  38. The importance of modeling social factors of language: Theory and practice. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 588–602, Online, 2021.
  39. Automatic detection of generated text is easiest when humans are fooled. https://arxiv.org/abs/1911.00650, 2019.
  40. Tuning models of code with compiler-generated reinforcement learning feedback. https://arxiv.org/abs/2305.18341, 2023.
  41. Automatic detection of machine generated text: A critical survey. https://arxiv.org/abs/2011.01314, 2020.
  42. Evading watermark based detection of AI-generated content. https://arxiv.org/abs/2305.03807, 2023.
  43. Exploiting programmatic behavior of llms: Dual-use through standard security attacks. arXiv preprint arXiv:2302.05733, 2023.
  44. Information Hiding. Artech House, 2016.
  45. Daniel Kelley. WormGPT – the Generative AI tool cybercriminals are using to launch business email compromise attacks. https://slashnext.com/blog/wormgpt-the-generative-ai-tool-cybercriminals-are-using-to-launch-business-email-compromise-attacks/, 2023.
  46. A watermark for large language models. https://arxiv.org/abs/2301.10226, 2023.
  47. On the reliability of watermarks for large language models. https://arxiv.org/abs/2306.04634, 2023.
  48. Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense. https://arxiv.org/abs/2303.13408, 2023.
  49. Rakesh Krishnan. FraudGPT: The villain avatar of ChatGPT. https://netenrich.com/blog/fraudgpt-the-villain-avatar-of-chatgpt, 2023.
  50. Robust distortion-free watermarks for language models, 2023.
  51. GPT detectors are biased against non-native english writers. https://arxiv.org/abs/2304.02819, 2023.
  52. Rethinking model evaluation as narrowing the socio-technical gap. https://arxiv.org/abs/2306.03100, 2023.
  53. Roberta: A robustly optimized bert pretraining approach. https://arxiv.org/abs/1907.11692, 2019.
  54. Global texture enhancement for fake face detection in the wild. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8060–8069, 2020.
  55. PTW: Pivotal tuning watermarking for pre-trained image generators. 32nd USENIX Security Symposium, 2023.
  56. Self-Refine: Iterative refinement with self-feedback. https://arxiv.org/abs/2303.17651, 2023.
  57. Makyen. Temporary policy: Generative AI (e.g., ChatGPT) is banned. https://meta.stackoverflow.com/questions/421831/temporary-policy-generative-ai-e-g-chatgpt-is-banned, 2022.
  58. Detection of gan-generated fake images over social networks. In 2018 IEEE conference on multimedia information processing and retrieval (MIPR), pages 384–389. IEEE, 2018.
  59. Incremental learning for the detection and classification of gan-generated images. In 2019 IEEE international workshop on information forensics and security (WIFS), pages 1–6. IEEE, 2019.
  60. Detecting GAN-generated imagery using color cues. https://arxiv.org/abs/1812.08247, 2018.
  61. Teaching language models to support answers with verified quotes. https://arxiv.org/abs/2203.11147, 2022.
  62. Meta. Introducing Llama 2. https://ai.meta.com/llama/, 2023.
  63. Midjourney. Midjourney. https://www.midjourney.com/, 2023.
  64. Midjourney. Stable diffusion. https://stablediffusionweb.com/, 2023.
  65. DetectGPT: Zero-shot machine-generated text detection using probability curvature. https://arxiv.org/abs/2301.11305, 2023.
  66. Detecting gan generated fake images using co-occurrence matrices. Electronic Imaging, 2019(5):532–1, 2019.
  67. Long sequence modeling with xgen: A 7b llm trained on 8k input sequence length. https://blog.salesforceairesearch.com/xgen/, 2023.
  68. Towards universal fake image detectors that generalize across generative models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24480–24489, 2023.
  69. OpenAI. GPT-2: 1.5b release. https://openai.com/research/gpt-2-1-5b-release, 2019.
  70. OpenAI. ChatGPT: Optimizing language models for dialogue. https://openai.com/blog/chatgpt/, 2022.
  71. OpenAI. DALL-E: Creating images from text. https://openai.com/research/dall-e, 2023.
  72. OpenAI. GPT-4 is OpenAI’s most advanced system, producing safer and more useful responses. https://openai.com/gpt-4, 2023.
  73. OpenAI. GPT-4 technical report. https://cdn.openai.com/papers/gpt-4.pdf, 2023.
  74. Training language models to follow instructions with human feedback. https://arxiv.org/abs/2203.02155, 2022.
  75. Asleep at the keyboard? Assessing the security of GitHub Copilot’s code contributions. https://arxiv.org/abs/2108.09293, 2021.
  76. Red teaming language models with language models. In Conference on Empirical Methods in Natural Language Processing, 2022.
  77. In-context retrieval-augmented language models. https://arxiv.org/abs/2302.00083, 2023.
  78. Measuring attribution in natural language generation models. https://arxiv.org/abs/2112.12870, 2022.
  79. Towards the detection of diffusion model deepfakes. https://arxiv.org/abs/2210.14571, 2023.
  80. Can AI-generated text be reliably detected? https://arxiv.org/abs/2303.11156, 2023.
  81. Mark Sellman. My AI: Snapchat chatbot coaches ‘girl, 13’ on losing virginity. https://www.thetimes.co.uk/article/my-ai-snapchat-chatbot-coaches-girl-13-on-losing-virginity-dj7p6268b, 2023.
  82. DE-FAKE: Detection and attribution of fake images generated by text-to-image generation models. https://arxiv.org/abs/2210.06998, 2023.
  83. The curse of recursion: Training on generated data makes models forget. https://arxiv.org/abs/2305.17493, 2023.
  84. Release strategies and the social impacts of language models. https://arxiv.org/abs/1908.09203, 2019.
  85. Process for adapting language models to society (PALMS) with values-targeted datasets. https://arxiv.org/abs/2106.10328, 2021.
  86. Learning to summarize from human feedback. https://arxiv.org/abs/2009.01325, 2022.
  87. Data feedback loops: Model-driven amplification of dataset biases. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors, Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 33883–33920. PMLR, 2023.
  88. The White House. FACT SHEET: Biden–Harris administration announces national cyber workforce and education strategy, unleashing America’s cyber talent. https://www.whitehouse.gov/briefing-room/statements-releases/2023/07/31/fact-sheet-biden-harris-administration-announces-national-cyber-workforce-and-education-strategy-unleashing-americas-cyber-talent/, 2023.
  89. Detecting child sexual abuse material shouldn’t be done at any cost. https://www.euronews.com/2023/07/04/detecting-child-sexual-abuse-material-shouldnt-be-done-at-any-cost, 2023.
  90. On the robustness of ChatGPT: An adversarial and out-of-distribution perspective. https://arxiv.org/abs/2302.12095, 2023.
  91. CNN-generated images are surprisingly easy to spot… for now. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8695–8704, 2020.
  92. Emergent abilities of large language models. https://arxiv.org/abs/2206.07682, 2022.
  93. Max Weiss. Deepfake bot submissions to federal public comment websites cannot be distinguished from human submissions. Technology Science, 2019121801, 2019.
  94. Challenges in detoxifying language models. https://arxiv.org/abs/2109.07445, 2021.
  95. Wikipedia contributors. Dual-use technology — Wikipedia, the free encyclopedia. https://en.wikipedia.org/w/index.php?title=Dual-use_technology&oldid=1167047934 [Online; accessed 3-August-2023], 2023.
  96. Linguistic steganography on Twitter: hierarchical language modeling with manual interaction. In Adnan M. Alattar, Nasir D. Memon, and Chad D. Heitzenrater, editors, Media Watermarking, Security, and Forensics 2014, volume 9028, pages 9–25. International Society for Optics and Photonics, SPIE, 2014.
  97. Chloe Xiang. “He would still be here”: Man dies by suicide after talking with AI chatbot, widow says. https://www.vice.com/en/article/pkadgm/man-dies-by-suicide-after-talking-with-ai-chatbot-widow-says, 2023.
  98. Recipes for safety in open-domain chatbots. https://arxiv.org/abs/2010.07079, 2021.
  99. ReAct: Synergizing reasoning and acting in language models. https://arxiv.org/abs/2210.03629, 2023.
  100. BERTScore: Evaluating text generation with BERT. https://arxiv.org/abs/1904.09675, 2020.
  101. Detecting and simulating artifacts in gan fake images. In 2019 IEEE International Workshop on Information Forensics and Security (WIFS), pages 1–6. IEEE, 2019.
  102. Men also like shopping: Reducing gender bias amplification using corpus-level constraints. https://arxiv.org/abs/1707.09457, 2017.
  103. Provable robust watermarking for AI-generated text. https://arxiv.org/abs/2306.17439, 2023.
  104. Protecting language generation models via invisible watermarking. https://arxiv.org/abs/2302.03162, 2023.
  105. PromptBench: Towards evaluating the robustness of large language models on adversarial prompts. https://arxiv.org/abs/2306.04528, 2023.
  106. Can large language models transform computational social science? https://arxiv.org/abs/2305.03514, 2023.
  107. Universal and transferable adversarial attacks on aligned language models. https://arxiv.org/abs/2307.15043, 2023.
Citations (73)

Summary

  • The paper outlines the dual-use dilemma of GenAI by detailing vulnerabilities such as personalized phishing, deepfake creation, and malware generation.
  • The paper evaluates defense strategies including neural detection, watermarking, and augmented penetration testing to enhance cybersecurity.
  • The paper underscores the necessity for socio-technical solutions and regulatory measures to securely and ethically deploy Generative AI.

Identifying and Mitigating the Security Risks of Generative AI

The monograph under review critically evaluates the dual-use dilemma innate to Generative AI (GenAI) technologies, highlighting their applicability for both benevolent and malicious endeavors. The discussion stems from a workshop hosted at Google, in collaboration with Stanford University and the University of Wisconsin-Madison, aiming to delineate the security challenges associated with GenAI and propose potential mitigatory strategies.

Dual-Use Dilemma and GenAI Capabilities

The emergence of GenAI technologies, including LLMs and stable diffusion, underscores the dual-use dilemma, where the same innovations can be harnessed for constructive purposes or malicious exploits. GenAI exemplifies remarkable capabilities such as in-context learning, code generation, and realistic media production. These capabilities, albeit transformative, also present new avenues for malicious actors. Importantly, GenAI can bolster the sophistication of cyberattacks, significantly amplifying their effectiveness and scale.

Threat Landscape and Exploits

The paper identifies several vulnerabilities and exploitations facilitated by GenAI models:

  1. Spear-Phishing: Enhanced linguistic capabilities of GenAI lead to articulate, personalized phishing emails, making detection increasingly difficult.
  2. Deepfake Dissemination: GenAI's proficiency in generating realistic images and videos can be misused to create fake content, potentially undermining trust and proliferating misinformation.
  3. Cyberattacks: The ability of GenAI to produce high-quality code extends to the creation of sophisticated malware, enriching the toolkit available to adversaries.
  4. Low Barrier of Entry: GenAI lowers traditional barriers to carrying out attacks, democratizing access to tools previously reserved for skilled attackers, thereby widening the adversary pool.

The inherent limitations, such as hallucinations in GenAI outputs, introduce additional vulnerabilities that can be exploited. Issues of unpredictability and potential data feedback loops further amplify the risks associated with GenAI deployment.

Defense Mechanisms and Strategic Responses

Defense strategies are categorized into direct interventions and ecosystem enhancements:

  • Detection Systems: Various approaches, including neural network-based detectors and watermarks, are highlighted to differentiate between AI-generated and human-generated content. However, these are counterbalanced by issues such as vulnerability to paraphrasing attacks.
  • Watermarking Techniques: These involve embedding detectable signals within GenAI outputs, offering a means to ascertain provenance. Yet, their effectiveness is limited by ease of removal through simple modifications.
  • Penetration Testing Augmented by GenAI: Incorporating AI into traditional pen-testing can enhance vulnerability analysis by automating broader coverage.
  • Multi-Modal Analysis: Leveraging GenAI’s capability across multiple data modalities can improve the robustness of false content detection.
  • Human–AI Collaboration: Encouraging synergistic interactions between human expertise and GenAI outputs fosters more accurate and context-aware outcomes across domains such as education and security.

Implications and Future Directions

The monograph underscores the necessity for a comprehensive approach encompassing technical, social, and regulatory dimensions. Immediate foci include developing robust methods for detecting AI-generated content and aligning code generation with secure coding practices. Long-term goals emphasize the importance of socio-technical solutions, value alignment, and democratizing research to mitigate risks associated with GenAI’s extensive capabilities.

Ultimately, this work catalyzes an ongoing conversation within the research community, urging a balanced examination of GenAI to ensure its secure and ethical deployment. The trajectory of GenAI will hinge upon these collaborative efforts and targeted research endeavors to preemptively address threats and harness the full potential of these technologies responsibly.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com