ChatSpamDetector: Leveraging Large Language Models for Effective Phishing Email Detection (2402.18093v2)
Abstract: The proliferation of phishing sites and emails poses significant challenges to existing cybersecurity efforts. Despite advances in malicious email filters and email security protocols, problems with oversight and false positives persist. Users often struggle to understand why emails are flagged as potentially fraudulent, risking the possibility of missing important communications or mistakenly trusting deceptive phishing emails. This study introduces ChatSpamDetector, a system that uses LLMs to detect phishing emails. By converting email data into a prompt suitable for LLM analysis, the system provides a highly accurate determination of whether an email is phishing or not. Importantly, it offers detailed reasoning for its phishing determinations, assisting users in making informed decisions about how to handle suspicious emails. We conducted an evaluation using a comprehensive phishing email dataset and compared our system to several LLMs and baseline systems. We confirmed that our system using GPT-4 has superior detection capabilities with an accuracy of 99.70%. Advanced contextual interpretation by LLMs enables the identification of various phishing tactics and impersonations, making them a potentially powerful tool in the fight against email-based phishing threats.
- 2007 TREC Public Spam Corpus (2007), https://plg.uwaterloo.ca/~gvcormac/treccorpus07/
- CSDMC Spam Corpus (2021), https://csmining.org/cdmc2021/datasets/
- dbsheta/spam-detection-using-deep-learning (2024), https://github.com/dbsheta/spam-detection-using-deep-learning
- Enron Email Dataset (2024), https://www.cs.cmu.edu/~enron/
- mo-messidi/Email-Phishing-Attempts-Detection-using-NLP (2024), https://github.com/mo-messidi/Email-Phishing-Attempts-Detection-using-NLP
- MoAbd/Spam-detection (2024), https://github.com/MoAbd/Spam-detection
- rf-peixoto/phishing_pot (2024), https://github.com/rf-peixoto/phishing_pot
- SpamAssassin public mail corpus (2024), https://spamassassin.apache.org/old/publiccorpus/
- VirusTotal. https://www.virustotal.com/ (2024)
- Google Cloud: Gemini API (2024), https://cloud.google.com/vertex-ai/docs/generative-ai/model-reference/gemini
- Google Workspace Blog: An overview of Gmail’s spam filters (2024), https://workspace.google.com/blog/identity-and-security/an-overview-of-gmails-spam-filters?hl=en
- Microsoft Azure: Azure OpenAI Service (2024), https://azure.microsoft.com/en-us/products/ai-services/openai-service
- Microsoft Support: Overview of the Junk Email Filter (2024), https://support.microsoft.com/en-us/office/overview-of-the-junk-email-filter-5ae3ea8e-cf41-4fa0-b02a-3b96e21de089
- Pilavakis, N., Jenkins, A., Kökciyan, N., Vaniea, K.: “i didn’t click”: What users say when reporting phishing. In: USEC 2023 (2023)
- Takashi Koide (8 papers)
- Naoki Fukushi (4 papers)
- Hiroki Nakano (57 papers)
- Daiki Chiba (10 papers)