Exploiting Large Language Models (LLMs) through Deception Techniques and Persuasion Principles (2311.14876v1)
Abstract: With the recent advent of LLMs, such as ChatGPT from OpenAI, BARD from Google, Llama2 from Meta, and Claude from Anthropic AI, gain widespread use, ensuring their security and robustness is critical. The widespread use of these LLMs heavily relies on their reliability and proper usage of this fascinating technology. It is crucial to thoroughly test these models to not only ensure its quality but also possible misuses of such models by potential adversaries for illegal activities such as hacking. This paper presents a novel study focusing on exploitation of such LLMs against deceptive interactions. More specifically, the paper leverages widespread and borrows well-known techniques in deception theory to investigate whether these models are susceptible to deceitful interactions. This research aims not only to highlight these risks but also to pave the way for robust countermeasures that enhance the security and integrity of LLMs in the face of sophisticated social engineering tactics. Through systematic experiments and analysis, we assess their performance in these critical security domains. Our results demonstrate a significant finding in that these LLMs are susceptible to deception and social engineering attacks.
- Y. Liu, G. Deng, Z. Xu, Y. Li, Y. Zheng, Y. Zhang, L. Zhao, T. Zhang, and Y. Liu, “Jailbreaking chatgpt via prompt engineering: An empirical study,” arXiv preprint arXiv:2305.13860, 2023.
- T. L. Carson, Lying and Deception: Theory and Practice. Oxford University Press; Reprint edition, 2012.
- R. B. Cialdini, Influence: The Psychology of Persuasion. Harper Business, 2006.
- K. Jones, M. Armstrong, M. Tornblad., and A. Akbar Siami Namin, “How social engineers use persuasion principles during vishing attacks,” Information and Computer Security, vol. 29, no. 2, pp. 314–331, 2020.
- www.independent.co.uk/tech/chatgpt-dark-web-wormgpt-hack-b2376627.html, Jul 2023.
- “Specialized ai assistant for ethical hackers.” https://www.hackergpt.chat/, 2023.
- S. Qiu, Q. Liu, S. Zhou, and C. Wu, “Review of artificial intelligence adversarial attack and defense technologies,” Applied Sciences, vol. 9, no. 5, p. 909, 2019.
- T. Y. Zhuo, Y. Huang, C. Chen, and Z. Xing, “Red teaming chatgpt via jailbreaking: Bias, robustness, reliability and toxicity,” arXiv preprint arXiv:2301.12867, pp. 12–2, 2023.
- E. Perez, S. Huang, F. Song, T. Cai, R. Ring, J. Aslanides, A. Glaese, N. McAleese, and G. Irving, “Red teaming language models with language models,” arXiv preprint arXiv:2202.03286, 2022.
- J. Shi, Y. Liu, P. Zhou, and L. Sun, “Badgpt: Exploring security vulnerabilities of chatgpt via backdoor attacks to instructgpt,” arXiv preprint arXiv:2304.12298, 2023.
- Z. Shi, Y. Wang, F. Yin, X. Chen, K.-W. Chang, and C.-J. Hsieh, “Red teaming language model detectors with language models,” arXiv preprint arXiv:2305.19713, 2023.
- H. Li, D. Guo, W. Fan, M. Xu, and Y. Song, “Multi-step jailbreaking privacy attacks on chatgpt,” arXiv preprint arXiv:2304.05197, 2023.
- R. T. Mathew, “Chatgpt: Proceed with caution,” Cancer Research, Statistics, and Treatment, vol. 6, no. 1, pp. 122–124, 2023.
- S. Casper, J. Lin, J. Kwon, G. Culp, and D. Hadfield-Menell, “Explore, establish, exploit: Red teaming language models from scratch,” arXiv preprint arXiv:2306.09442, 2023.
- D. Ganguli, L. Lovitt, J. Kernion, A. Askell, Y. Bai, S. Kadavath, B. Mann, E. Perez, N. Schiefer, K. Ndousse, et al., “Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned,” arXiv preprint arXiv:2209.07858, 2022.
- A. Robey, E. Wong, H. Hassani, and G. J. Pappas, “Smoothllm: Defending large language models against jailbreaking attacks,” arXiv preprint arXiv:2310.03684, 2023.
- “Jailbreak chat.” https://www.jailbreakchat.com/, Sep 2023.
- B. Karki, F. Abri, A. S. Namin, and K. S. Jones, “Using transformers for identification of persuasion principles in phishing emails,” in 2022 IEEE International Conference on Big Data (Big Data), pp. 2841–2848, IEEE, 2022.
- M. E. Armstrong, K. S. Jones, and A. S. Namin, “How perceptions of caller honesty vary during vishing attacks that include highly sensitive or seemingly innocuous requests,” Human Factors, vol. 65, no. 2, pp. 275–287, 2023.
- K. D. Mitnick, The art of deception: Controlling the human element of security. Wiley, 2003.
- Sonali Singh (6 papers)
- Faranak Abri (13 papers)
- Akbar Siami Namin (29 papers)