LLMs Killed the Script Kiddie: How Agents Supported by Large Language Models Change the Landscape of Network Threat Testing (2310.06936v1)
Abstract: In this paper, we explore the potential of LLMs to reason about threats, generate information about tools, and automate cyber campaigns. We begin with a manual exploration of LLMs in supporting specific threat-related actions and decisions. We proceed by automating the decision process in a cyber campaign. We present prompt engineering approaches for a plan-act-report loop for one action of a threat campaign and and a prompt chaining design that directs the sequential decision process of a multi-action campaign. We assess the extent of LLM's cyber-specific knowledge w.r.t the short campaign we demonstrate and provide insights into prompt design for eliciting actionable responses. We discuss the potential impact of LLMs on the threat landscape and the ethical considerations of using LLMs for accelerating threat actor capabilities. We report a promising, yet concerning, application of generative AI to cyber threats. However, the LLM's capabilities to deal with more complex networks, sophisticated vulnerabilities, and the sensitivity of prompts are open questions. This research should spur deliberations over the inevitable advancements in LLM-supported cyber adversarial landscape.
- Andreas, J. Language models as agent models, 2022.
- Gpt-neox-20b: An open-source autoregressive language model, 2022.
- Language models are few-shot learners, 2020.
- Cybervan: A cyber security virtual assured network testbed. In MILCOM 2016-2016 IEEE Military Communications Conference (2016), IEEE, pp. 1125–1130.
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311 (2022).
- Collaborating with language models for embodied reasoning. arXiv preprint arXiv:2302.00763 (2023).
- Bert: Pre-training of deep bidirectional transformers for language understanding, 2019.
- The pile: An 800gb dataset of diverse text for language modeling, 2020.
- Deberta: Decoding-enhanced bert with disentangled attention, 2021.
- Bron–linking attack tactics, techniques, and patterns with defensive weaknesses, vulnerabilities and affected platform configurations. arXiv preprint arXiv:2010.00533 (2020).
- Using a collated cybersecurity dataset for machine learning and artificial intelligence. ArXiv abs/2108.02618 (2021).
- Training compute-optimal large language models, 2022.
- Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International Conference on Machine Learning (2022), PMLR, pp. 9118–9147.
- Inner monologue: Embodied reasoning through planning with language models. In arXiv preprint arXiv:2207.05608 (2022).
- Few-shot learning with retrieval augmented language models. arXiv preprint arXiv:2208.03299 (2022).
- Roberta: A robustly optimized bert pretraining approach, 2019.
- Microsoft. Cyberbattlesim. https://github.com/microsoft/CyberBattleSim, 2021.
- MITRE. ATT&CK Matrix for Enterprise.
- Cyberevo: evolutionary search of knowledge-based behaviors in a cyber attack campaign. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (2022), pp. 2168–2176.
- Cyber threat assessment via attack scenario simulation using an integrated adversary and network modeling approach. The Journal of Defense Modeling and Simulation 15, 1 (2018), 13–29.
- OpenAI. Gpt-4 technical report, 2023.
- Adversarial genetic programming for cyber security: A rising application domain where gp matters. Genetic Programming and Evolvable Machines 21 (2020), 219–250.
- Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
- Scaling language models: Methods, analysis & insights from training gopher. arXiv preprint arXiv:2112.11446 (2021).
- Exploring the limits of transfer learning with a unified text-to-text transformer, 2020.
- Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100 (2022).
- Synthetic prompting: Generating chain-of-thought demonstrations for large language models. arXiv preprint arXiv:2302.00618 (2023).
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).
- GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax, May 2021.
- Self-consistency improves chain of thought reasoning in language models, 2023.
- Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents, 2023.
- Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903 (2022).
- Cyber ranges and security testbeds: Scenarios, functions, tools and architecture. Computers & Security 88 (2020), 101636.
- Stephen Moskal (6 papers)
- Sam Laney (1 paper)
- Erik Hemberg (27 papers)
- Una-May O'Reilly (43 papers)