Feasibility of LLMs to Execute Multistage Network Attacks

Determine whether large language models can realize multistage network attacks that require executing coordinated actions across multiple hosts, including reconnaissance, exploitation for initial access, lateral movement, and data exfiltration.

Background

Prior work has shown that LLMs demonstrate preliminary promise on security-related tasks, especially capture-the-flag style challenges or single-host problems. However, real-world intrusions often require coordinated, multistage campaigns across many hosts that involve discovery, exploitation, lateral movement, privilege escalation, and data exfiltration.

This paper highlights the gap by explicitly noting uncertainty about whether LLMs can realize such multistage network attacks. The authors then evaluate several models and introduce an abstraction layer (Incalmo) to explore this question. The quoted sentence captures the explicit unresolved question motivating the paper.

References

However, it is unclear whether LLMs are able to realize multistage network attacks, which involve executing a wide variety of actions across multiple hosts such as conducting reconnaissance, exploiting vulnerabilities to gain initial access, leveraging internal hosts to move laterally, and using multiple compromised hosts to exfiltrate data.

On the Feasibility of Using LLMs to Autonomously Execute Multi-host Network Attacks (2501.16466 - Singer et al., 27 Jan 2025) in Abstract