Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A New Era in LLM Security: Exploring Security Concerns in Real-World LLM-based Systems (2402.18649v1)

Published 28 Feb 2024 in cs.CR and cs.AI

Abstract: LLM systems are inherently compositional, with individual LLM serving as the core foundation with additional layers of objects such as plugins, sandbox, and so on. Along with the great potential, there are also increasing concerns over the security of such probabilistic intelligent systems. However, existing studies on LLM security often focus on individual LLM, but without examining the ecosystem through the lens of LLM systems with other objects (e.g., Frontend, Webtool, Sandbox, and so on). In this paper, we systematically analyze the security of LLM systems, instead of focusing on the individual LLMs. To do so, we build on top of the information flow and formulate the security of LLM systems as constraints on the alignment of the information flow within LLM and between LLM and other objects. Based on this construction and the unique probabilistic nature of LLM, the attack surface of the LLM system can be decomposed into three key components: (1) multi-layer security analysis, (2) analysis of the existence of constraints, and (3) analysis of the robustness of these constraints. To ground this new attack surface, we propose a multi-layer and multi-step approach and apply it to the state-of-art LLM system, OpenAI GPT4. Our investigation exposes several security issues, not just within the LLM model itself but also in its integration with other components. We found that although the OpenAI GPT4 has designed numerous safety constraints to improve its safety features, these safety constraints are still vulnerable to attackers. To further demonstrate the real-world threats of our discovered vulnerabilities, we construct an end-to-end attack where an adversary can illicitly acquire the user's chat history, all without the need to manipulate the user's input or gain direct access to OpenAI GPT4. Our demo is in the link: https://fzwark.github.io/LLM-System-Attack-Demo/

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Bing Chat. https://www.bing.com/chat, 2023.
  2. Canva GPT. https://chat.openai.com/g/g-alKfVrz9K-canva, 2023.
  3. DocMaker ChatGPT Plugin. https://www.aidocmaker.com/, 2023.
  4. Github Copilot - Your AI pair programmer. https://github.com/features/copilot, 2023.
  5. Google images. https://images.google.com/, 2023.
  6. GPTs Statistic Data. https://twitter.com/imrat/status/1726317710945235003, 2023.
  7. GPTs Store. https://openai.com/blog/introducing-gpts, 2023.
  8. Introducting ChatGPT. https://openai.com/blog/chatgpt, 2023.
  9. OpenAI Dev Day. https://openai.com/blog/new-models-and-developer-products-announced-at-devday, 2023.
  10. OpenAI Plugins. https://openai.com/blog/chatgpt-plugins, 2023.
  11. Third-Party GPTs Store. https://gptstore.ai/, 2023.
  12. WebPilot ChatGPT Plugin. https://webreader.webpilotai.com/legal_info.html, 2023.
  13. Secure computer system: Unified exposition and multics interpretation. 1976.
  14. Paul Black. Sard: A software assurance reference dataset, 1970.
  15. Compositional security for reentrant applications. In 2021 IEEE Symposium on Security and Privacy (SP), pages 1249–1267. IEEE, 2021.
  16. Jailbreaking black box large language models in twenty queries. arXiv preprint arXiv:2310.08419, 2023.
  17. Evaluation of chatgpt model for vulnerability detection, 2023.
  18. Dorothy E Denning. A lattice model of secure information flow. Communications of the ACM, 19(5):236–243, 1976.
  19. Mathematical capabilities of chatgpt. arXiv preprint arXiv:2301.13867, 2023.
  20. Security policies and security models. In 1982 IEEE Symposium on Security and Privacy, pages 11–11. IEEE, 1982.
  21. Chatgpt is not all you need. a state of the art review of large generative ai models. arXiv preprint arXiv:2301.04655, 2023.
  22. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. arXiv preprint arXiv:2302.12173, 2023.
  23. John Gruber. Markdown: Syntax. URL http://daringfireball. net/projects/markdown/syntax. Retrieved on June, 24:640, 2012.
  24. From chatgpt to threatgpt: Impact of generative ai in cybersecurity and privacy. IEEE Access, 2023.
  25. Look before you leap: An exploratory study of uncertainty measurement for large language models. arXiv preprint arXiv:2307.10236, 2023.
  26. Llm platform security: Applying a systematic evaluation framework to openai’s chatgpt plugins. arXiv preprint arXiv:2309.10254, 2023.
  27. Gerd Kortemeyer. Could an artificial-intelligence agent pass an introductory physics course? Physical Review Physics Education Research, 19(1):010132, 2023.
  28. Kay Lehnert. Ai insights into theoretical physics and the swampland program: A journey through the cosmos with chatgpt. arXiv preprint arXiv:2301.08155, 2023.
  29. Stolen memories: Leveraging model memorization for calibrated {{\{{White-Box}}\}} membership inference. In 29th USENIX security symposium (USENIX Security 20), pages 1605–1622, 2020.
  30. Prompt Injection attack against LLM-integrated Applications, June 2023. arXiv:2306.05499 [cs].
  31. Prompt Injection Attacks and Defenses in LLM-Integrated Applications, October 2023. arXiv:2310.12815 [cs].
  32. X. 509 internet public key infrastructure online certificate status protocol-ocsp. Technical report, 1999.
  33. United States. Department of Defense. Department of Defense Trusted Computer System Evaluation Criteria, volume 83. Department of Defense, 1987.
  34. Asleep at the keyboard? assessing the security of github copilot’s code contributions. In 2022 IEEE Symposium on Security and Privacy (SP), pages 754–768. IEEE, 2022.
  35. Examining zero-shot vulnerability repair with large language models. In 2023 IEEE Symposium on Security and Privacy (SP), pages 1–18. IEEE Computer Society, 2022.
  36. From Prompt Injections to SQL Injection Attacks: How Protected is Your LLM-Integrated Web Application?, August 2023. arXiv:2308.01990 [cs].
  37. Ignore Previous Prompt: Attack Techniques For Language Models, November 2022. arXiv:2211.09527 [cs].
  38. Jatmo: Prompt Injection Defense by Task-Specific Finetuning, January 2024. arXiv:2312.17673 [cs].
  39. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  40. Smoothllm: Defending large language models against jailbreaking attacks, 2023.
  41. Language-based information-flow security. IEEE Journal on selected areas in communications, 21(1):5–19, 2003.
  42. Maatphor: Automated Variant Analysis for Prompt Injection Attacks, December 2023. arXiv:2312.11513 [cs].
  43. Access control: principle and practice. IEEE communications magazine, 32(9):40–48, 1994.
  44. Glorin Sebastian. Privacy and data protection in chatgpt and other ai chatbots: Strategies for securing user information. Available at SSRN 4454761, 2023.
  45. An independent evaluation of chatgpt on mathematical word problems (mwp). arXiv preprint arXiv:2302.13814, 2023.
  46. Survey of vulnerabilities in large language models revealed by adversarial attacks. arXiv preprint arXiv:2310.10844, 2023.
  47. Xuchen Suo. Signed-Prompt: A New Approach to Prevent Prompt Injection Attacks Against LLM-Integrated Applications, January 2024. arXiv:2401.07612 [cs].
  48. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  49. Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game, November 2023. arXiv:2311.01011 [cs].
  50. Decodingtrust: A comprehensive assessment of trustworthiness in gpt models, 2023.
  51. Safeguarding Crowdsourcing Surveys from ChatGPT with Prompt Injection, June 2023. arXiv:2306.08833 [cs].
  52. Jailbreak and guard aligned language models with only few in-context demonstrations, 2023.
  53. Unveiling security, privacy, and ethical concerns of chatgpt. Journal of Information and Intelligence, 2023.
  54. A language for automatically enforcing privacy policies. ACM SIGPLAN Notices, 47(1):85–96, 2012.
  55. Benchmarking and defending against indirect prompt injection attacks on large language models. arXiv preprint arXiv:2312.14197, 2023.
  56. A Novel Evaluation Framework for Assessing Resilience Against Prompt Injection Attacks in Large Language Models, January 2024. arXiv:2401.00991 [cs].
  57. Assessing Prompt Injection Risks in 200+ Custom GPTs, November 2023. arXiv:2311.11538 [cs].
  58. Secure program partitioning. ACM Transactions on Computer Systems (TOCS), 20(3):283–328, 2002.
  59. Using replication and partitioning to build secure distributed systems. In 2003 Symposium on Security and Privacy, 2003., pages 236–250. IEEE, 2003.
  60. Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Fangzhou Wu (11 papers)
  2. Ning Zhang (278 papers)
  3. Somesh Jha (112 papers)
  4. Patrick McDaniel (70 papers)
  5. Chaowei Xiao (110 papers)
Citations (39)
Github Logo Streamline Icon: https://streamlinehq.com