- The paper introduces a modular gym environment that integrates simulation and emulation for training autonomous red and blue cyber agents.
- It presents a two-stage process where agents are initially trained using a DQN with LSTM in simulation and then validated on AWS-based virtual networks.
- Experimental results show a 66% success rate for DRL agents, demonstrating the frameworkâs potential to refine cyber operations techniques.
The paper introduces CybORG (Cyber Operations Research Gym), an environment designed to facilitate the development of autonomous cyber agents, for both red and blue teams, via a two-stage process: training in simulation and validation in a real environment. CybORG aims to address the challenges of Autonomous Cyber Operations (ACO) such as adversarial environments, the "reality gap", and the rapidly evolving nature of cybersecurity.
The authors highlight the limitations of existing cybersecurity experimentation environments, summarized in Table 1, which include DETERlab, VINE, SmallWorld, BRAWL, Galaxy, Insight, CANDLES, CyAMS, CyberBattleSim and FARLAND. The authors argue that no single environment encapsulates all the features required for ACO agent development, namely the ability to train red and blue agents to realistically interact with hosts, rapid training in simulated environments, and validation in emulated environments.
CybORG is designed with a modular architecture that allows for scenarios to be implemented at varying levels of fidelity. The tool generates scenarios based on pre-generated descriptions, initializes agents, implements actions, and assesses effectiveness in discrete steps. Agents select actions from their action space, receive observations of the updated state, and the scenario runs until a termination condition is met.
The design of CybORG encompasses several key components:
- Scenarios: Define the "game," including agents, actions, initial information, reward calculation, host configurations, and network connections. Scenarios are deployed as simulated or emulated from a YAML description file with host configurations and actions specified in separate YAML files.
- Actions: Defined via the OpenAI Gym interface, these are based on actions available to cybersecurity professionals. In simulation, actions are modeled as state transitions, while in emulation, they are implemented as executable commands.
- Observations: Agents receive observations as a dictionary of key-value pairs, including a "success" key and host-specific data categorized into Interface, Session, User, System, and Process. Agents also receive a reward value and a flag indicating run completion.
- Simulator: Represents scenarios as a finite state machine, where actions update the state based on preconditions and effects. The state includes details such as file creation/deletion and network connections to reduce divergence from the emulator.
- Emulator: Uses Amazon Web Services (AWS) with virtual machines to create a high-fidelity cybersecurity environment, deploying and configuring virtual networks using AWS's Command Line Interface (CLI). The emulator supports two modes: pre-deployed and deployed, and uses actuator objects to connect to virtual machines (VMs) via SSH or specialized session handlers for third-party tools like Metasploit Framework and Velociraptor.
The paper describes an experiment using a scenario with three hosts split into two subnets, where a red agent aims to gain a session on an internal host as the System user. The agent uses Metasploit and Meterpreter to perform actions such as SSH Bruteforce, Portscan, Pingsweep, Upgrade to Meterpreter, IPConfig, MS17-010-PSExec, Autoroute, and Sleep. The agent is trained using a Deep Q-Network (DQN) with Long Short-Term Memory (LSTM) to learn a policy mapping the environment state to discounted rewards.
The Deep Reinforcement Learning (DRL) agents were trained for up to 2500 iterations in the simulator. The trained agents were then tested in the emulator, resulting in a 66% success rate across 21 independent Reinforcement Learning (RL) agents, with nearly half of the agents successful on every emulator run. Failures were attributed to deficiencies in the simulator model or emulator interface, highlighting the potential for refining CybORG through combined simulation and emulation.
The authors conclude that CybORG demonstrates the feasibility of training agents in simulation and validating them on virtualized infrastructure using professional security tools. Future work includes implementing actions, sensors, and actuators for blue agent training and developing more complex scenarios with deception capabilities.