Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models (2308.12287v2)

Published 23 Aug 2023 in cs.CR

Abstract: AI programs, built using LLMs, make it possible to automatically create phishing emails based on a few data points about a user. They stand in contrast to traditional phishing emails that hackers manually design using general rules gleaned from experience. The V-Triad is an advanced set of rules for manually designing phishing emails to exploit our cognitive heuristics and biases. In this study, we compare the performance of phishing emails created automatically by GPT-4 and manually using the V-Triad. We also combine GPT-4 with the V-Triad to assess their combined potential. A fourth group, exposed to generic phishing emails, was our control group. We utilized a factorial approach, sending emails to 112 randomly selected participants recruited for the study. The control group emails received a click-through rate between 19-28%, the GPT-generated emails 30-44%, emails generated by the V-Triad 69-79%, and emails generated by GPT and the V-Triad 43-81%. Each participant was asked to explain why they pressed or did not press a link in the email. These answers often contradict each other, highlighting the need for personalized content. The cues that make one person avoid phishing emails make another person fall for them. Next, we used four popular LLMs (GPT, Claude, PaLM, and LLaMA) to detect the intention of phishing emails and compare the results to human detection. The LLMs demonstrated a strong ability to detect malicious intent, even in non-obvious phishing emails. They sometimes surpassed human detection, although often being slightly less accurate than humans. Finally, we make an analysis of the economic aspects of AI-enabled phishing attacks, showing how LLMs can increase the incentives of phishing and spear phishing by reducing their costs.

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

References (23)

Authors (5)

Fredrik Heiding (1 paper)
Bruce Schneier (9 papers)
Arun Vishwanath (4 papers)
Jeremy Bernstein (25 papers)
Peter S. Park (16 papers)

Citations (25)

View on Semantic Scholar

Tweets

https://twitter.com/SimonLermenAI/status/1875898467794800904

Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models (2308.12287v2)

Related Papers

Tweets