Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SECURE: Benchmarking Large Language Models for Cybersecurity (2405.20441v4)

Published 30 May 2024 in cs.CR, cs.AI, and cs.HC

Abstract: LLMs have demonstrated potential in cybersecurity applications but have also caused lower confidence due to problems like hallucinations and a lack of truthfulness. Existing benchmarks provide general evaluations but do not sufficiently address the practical and applied aspects of LLM performance in cybersecurity-specific tasks. To address this gap, we introduce the SECURE (Security Extraction, Understanding & Reasoning Evaluation), a benchmark designed to assess LLMs performance in realistic cybersecurity scenarios. SECURE includes six datasets focussed on the Industrial Control System sector to evaluate knowledge extraction, understanding, and reasoning based on industry-standard sources. Our study evaluates seven state-of-the-art models on these tasks, providing insights into their strengths and weaknesses in cybersecurity contexts, and offer recommendations for improving LLMs reliability as cyber advisory tools.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Dipkamal Bhusal (9 papers)
  2. Md Tanvirul Alam (8 papers)
  3. Le Nguyen (11 papers)
  4. Ashim Mahara (2 papers)
  5. Zachary Lightcap (1 paper)
  6. Rodney Frazier (1 paper)
  7. Romy Fieblinger (3 papers)
  8. Grace Long Torales (1 paper)
  9. Nidhi Rastogi (26 papers)
  10. Benjamin A. Blakely (2 papers)