Spear Phishing With Large Language Models (2305.06972v3)

Published 11 May 2023 in cs.CY, cs.AI, and cs.CR

Abstract: Recent progress in AI, particularly in the domain of LLMs, has resulted in powerful and versatile dual-use systems. This intelligence can be put towards a wide variety of beneficial tasks, yet it can also be used to cause harm. This study explores one such harm by examining how LLMs can be used for spear phishing, a form of cybercrime that involves manipulating targets into divulging sensitive information. I first explore LLMs' ability to assist with the reconnaissance and message generation stages of a spear phishing attack, where I find that LLMs are capable of assisting with the email generation phase of a spear phishing attack. To explore how LLMs could potentially be harnessed to scale spear phishing campaigns, I then create unique spear phishing messages for over 600 British Members of Parliament using OpenAI's GPT-3.5 and GPT-4 models. My findings provide some evidence that these messages are not only realistic but also cost-effective, with each email costing only a fraction of a cent to generate. Next, I demonstrate how basic prompt engineering can circumvent safeguards installed in LLMs, highlighting the need for further research into robust interventions that can help prevent models from being misused. To further address these evolving risks, I explore two potential solutions: structured access schemes, such as application programming interfaces, and LLM-based defensive systems.

PDF Abstract

Analysis of LLMs in Scaling Spear Phishing Campaigns

Julian Hazell's paper evaluates the application of LLMs in scaling spear phishing campaigns, providing both a detailed methodology and an analysis of the implications for cybersecurity. Hazell's paper elaborates on using LLMs like GPT-3.5 and GPT-4 in the personalization phases of cyberattacks, especially in the context of spear phishing. His findings illustrate the sophistication and cost-effectiveness of LLMs in automating and scaling spear phishing efforts.

Hazell's paper asserts that LLMs are adept at assisting cybercriminals at various stages of an attack, namely reconnaissance, message generation, and compromise. Specifically, the models improve efficiency during reconnaissance by using publicly available data to generate personalized content. Consequently, they produce realistic spear phishing messages even when prompted to bypass built-in safety protocols via prompt engineering.

Numerical Results and Key Findings

The paper details an experiment where over 600 British Members of Parliament were targeted using OpenAI's LLMs. The paper highlights that modern LLMs can produce highly realistic spear phishing emails at an exceptionally low cost. The paper cites generating 1,000 emails could cost as little as $10, illustrating the economic feasibility of using LLM-based spear phishing. The sophisticated tailoring of these emails demonstrated clear enhancements in mimicking human-like writing significantly compared to previous generations of models.

Governance Implications and Challenges

Hazell identifies the dual-use nature of LLMs as a significant concern, where an AI system capable of benign applications can also be manipulated for cybercrime. Recognizing the inherent risks, he calls for robust governance interventions. The paper proposes two potential solutions:

Structured Access Schemes: Implementing Application Programming Interfaces (APIs) to manage and oversee user interactions with LLMs, potentially tracking and linking malicious activity to perpetrators.
LLM-Based Defensive Systems: Developing defensive mechanisms using LLMs to analyze and filter phishing content in real-time, potentially improving upon existing email security protocols.

Theoretical and Practical Implications

The paper advances the dialogue on AI's role in cybersecurity, underscoring the critical need for systemic changes in AI governance. Hazell suggests that contextualizing AI implementations with risk assessments can help preemptively address emerging threats. Additionally, the extrapolation of current capabilities foreshadows potentially more autonomous cybercrime activities which could politicize AI policy discussions further.

Future Scope

Looking forward, Hazell's research calls for concentrated efforts in enhancing the controllability of LLMs within cybersecurity frameworks. This involves not only improving detection algorithms but also innovating robust regulatory frameworks that balance technological advances with ethical considerations. The paper envisions a future where AI can autonomously conduct complex operations, necessitating the proactive development of anticipatory governance and legislative mechanisms.

Overall, this paper renders a comprehensive account of how LLMs can impact and scale spear phishing campaigns, stirring important discourses in AI governance, cybercrime mitigation, and societal impact. It lays a strong foundation for further exploration into AI's role in cybersecurity, emphasizing a dual focus on exploiting and defending against these increasingly pervasive technologies.