Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ChatSpamDetector: Leveraging Large Language Models for Effective Phishing Email Detection (2402.18093v2)

Published 28 Feb 2024 in cs.CR

Abstract: The proliferation of phishing sites and emails poses significant challenges to existing cybersecurity efforts. Despite advances in malicious email filters and email security protocols, problems with oversight and false positives persist. Users often struggle to understand why emails are flagged as potentially fraudulent, risking the possibility of missing important communications or mistakenly trusting deceptive phishing emails. This study introduces ChatSpamDetector, a system that uses LLMs to detect phishing emails. By converting email data into a prompt suitable for LLM analysis, the system provides a highly accurate determination of whether an email is phishing or not. Importantly, it offers detailed reasoning for its phishing determinations, assisting users in making informed decisions about how to handle suspicious emails. We conducted an evaluation using a comprehensive phishing email dataset and compared our system to several LLMs and baseline systems. We confirmed that our system using GPT-4 has superior detection capabilities with an accuracy of 99.70%. Advanced contextual interpretation by LLMs enables the identification of various phishing tactics and impersonations, making them a potentially powerful tool in the fight against email-based phishing threats.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (14)
  1. 2007 TREC Public Spam Corpus (2007), https://plg.uwaterloo.ca/~gvcormac/treccorpus07/
  2. CSDMC Spam Corpus (2021), https://csmining.org/cdmc2021/datasets/
  3. dbsheta/spam-detection-using-deep-learning (2024), https://github.com/dbsheta/spam-detection-using-deep-learning
  4. Enron Email Dataset (2024), https://www.cs.cmu.edu/~enron/
  5. mo-messidi/Email-Phishing-Attempts-Detection-using-NLP (2024), https://github.com/mo-messidi/Email-Phishing-Attempts-Detection-using-NLP
  6. MoAbd/Spam-detection (2024), https://github.com/MoAbd/Spam-detection
  7. rf-peixoto/phishing_pot (2024), https://github.com/rf-peixoto/phishing_pot
  8. SpamAssassin public mail corpus (2024), https://spamassassin.apache.org/old/publiccorpus/
  9. VirusTotal. https://www.virustotal.com/ (2024)
  10. Google Cloud: Gemini API (2024), https://cloud.google.com/vertex-ai/docs/generative-ai/model-reference/gemini
  11. Google Workspace Blog: An overview of Gmail’s spam filters (2024), https://workspace.google.com/blog/identity-and-security/an-overview-of-gmails-spam-filters?hl=en
  12. Microsoft Azure: Azure OpenAI Service (2024), https://azure.microsoft.com/en-us/products/ai-services/openai-service
  13. Microsoft Support: Overview of the Junk Email Filter (2024), https://support.microsoft.com/en-us/office/overview-of-the-junk-email-filter-5ae3ea8e-cf41-4fa0-b02a-3b96e21de089
  14. Pilavakis, N., Jenkins, A., Kökciyan, N., Vaniea, K.: “i didn’t click”: What users say when reporting phishing. In: USEC 2023 (2023)
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Takashi Koide (8 papers)
  2. Naoki Fukushi (4 papers)
  3. Hiroki Nakano (57 papers)
  4. Daiki Chiba (10 papers)
Citations (15)
Youtube Logo Streamline Icon: https://streamlinehq.com