Core War: Adversarial Program Evolution

Updated 8 January 2026

Core War is a Turing-complete adversarial game where assembly-like programs, known as warriors, contest control of a virtual machine.
It employs innovative self-play strategies—specifically the Digital Red Queen algorithm with MAP-Elites—to continually evolve robust and adaptive warriors.
Empirical studies demonstrate convergent evolution toward generalist strategies, validated by human-designed benchmarks and diverse performance metrics.

Core War is a computational environment and competitive programming game in which assembly-like programs, known as warriors, compete for control of a virtual machine. Originating in the field of artificial life, Core War offers a Turing-complete, fully sandboxed testbed for studying adversarial program evolution, self-play, and the dynamics of continual adaptation. The environment has found renewed significance as a model for open-ended adversarial processes and as a benchmark for evolutionary algorithms, including those driven by LLMs (Kumar et al., 6 Jan 2026).

1. Formalizing the Adversarial Objective in Core War

Traditional evolutionary or program synthesis frameworks typically employ static optimization—searching for a solution that maximizes a fixed fitness function. In contrast, Core War research, and specifically the Digital Red Queen (DRQ) algorithm, instantiates a continually shifting adversarial “Red Queen” arms race. Rather than optimizing against a single static objective, each new warrior is evolved to outperform an ever-expanding set of prior champions. Formally:

$w_t = \arg\max_{w} \; \mathbb{E}_{\text{seed}}\left[ Fitness(w; \{w_0, ..., w_{t-1}\}) \right]$

where $w_0$ denotes the seed warrior, $\{w_1, ..., w_{t-1}\}$ are previous champions, and the expectation averages over randomized battle initializations. In contrast, static-target optimization seeks

$w^* = \arg\max_{w} \; \mathbb{E}\left[ Fitness(w; \{w_{target}\}) \right]$

with a fixed opponent.

Fitness in DRQ is context-dependent and based on survival and elimination: in an $N$ -way match of up to $\mathcal{T}$ timesteps, each living warrior shares $N/\mathcal{T}$ per timestep, yielding

$Fitness(i; \text{opponents}) = \sum_{\tau=1}^{\mathcal{T}} \frac{N}{\mathcal{T}} \cdot \frac{A_i(\tau)}{\sum_j A_j(\tau)}$

where $A_i(\tau) = 1$ if warrior $i$ is alive at time $\tau$ , and 0 otherwise.

2. Algorithmic Structure: The Digital Red Queen Self-Play Loop

The DRQ algorithm is structured as a multilevel self-play loop operating over a sequence of evolutionary rounds. At each round, a new warrior is evolved to defeat the current population of opponents selected from history. Optimization within a round uses MAP-Elites to preserve quality-diversity. The high-level pseudocode is summarized below:

Input: initial champion w₀, rounds T, MAP-Elites grid C, history length K
History H = [w₀]
for t in 1…T:                                  # Outer (Red Queen) loop
  Opponents = last K warriors in H             # Select recent champions
  A.initialize_empty(C)                        # MAP-Elites archive
  for each w in Opponents: A.add_elite(w)

  for iter in 1…I:                             # Inner evolutionary search
    w_parent = A.random_cell_sample()
    w_child = LLM_mutate(w_parent)
    f = average_{s=1…S} Fitness(w_child; Opponents, seed=s)
    bd = BD(w_child)                           # Behavior descriptor
    A.try_update_cell(bd, w_child, f)
    
  wₜ = A.get_overall_best()                    # Select champion
  append wₜ to H
end for
Output: lineage H = [w₀, w₁, …, w_T]

Key hyperparameters include the total number of rounds $T$ (e.g., 10), the number of inner iterations $I$ (e.g., 1,000), opponent-history length $K$ (e.g., 1, 3, or all previous), and the number of stochastic seeds $S$ per evaluation (e.g., 20) (Kumar et al., 6 Jan 2026).

3. Warrior Representation and LLM Mutation

Core War warriors are encoded in Redcode, a low-level language with approximately 20 opcodes (such as DAT, SPL, MOV, ADD), a set of modifiers, and diverse addressing modes. In the DRQ framework:

New generation: An LLM (GPT-4.1-mini) is prompted with the Redcode specification and tasked with producing a novel warrior.
Mutation: Given a $w_{\text{parent}}$ , the LLM receives the parent’s Redcode and is asked for a variant aimed at improved performance.

No fine-tuning is applied; the model leverages its pretrained knowledge augmented by context-specific instructions. The LLM thus samples from a conditional distribution $P_{\mathrm{LLM}}(w \mid w_{\text{parent}})$ , guiding search over the high-dimensional space of Redcode programs.

4. Experimental Protocol and Evaluation Metrics

The Core War simulation in DRQ is specified as follows:

Core size: 8,000 memory cells arranged circularly.
Maximum timesteps: 80,000 per match.
Thread cap: 8,000 per warrior.
Program constraint: $\leq$ 100 instructions.
Placement: Warriors seeded $\geq$ 100 cells apart, evaluated over 20 random initializations.

Baseline performance is assessed using a held-out set of 317 human-designed warriors. Generality of a warrior $w$ is the fraction of these human opponents defeated or tied by $w$ in zero-shot 1-on-1 matches.

MAP-Elites employs behavior descriptors $BD(w) = (\log(\#\text{spawned\_threads}), \log(\text{memory\_coverage}))$ , where:

$\#\text{spawned\_threads}$ : Total number of threads spawned (SPL) by $w$ .
$\text{memory\_coverage}$ : Number of unique addresses written/read.

Additional metrics include:

Phenotype: Warrior’s fitness vector against all 317 human benchmarks.
Genotype: Text embedding of Redcode via OpenAI text-embedding-3.
Across-run diversity: Principal component and variance analyses on phenotype/genotype.
Cycle counts: Number of $(a, b, c)$ triplets forming rock–paper–scissors cycles in dominance relations.
Rate of change: $\| \text{phenotype}(w_t) - \text{phenotype}(w_{t-1}) \|$ .

5. Empirical Observations and Analysis

A. Static vs. Red Queen Baselines

In one-round (static) settings:

Zero-shot LLM: Defeats $\approx 1.7\%$ of 294 human warriors.
Best-of-8: Defeats $\approx 22.1\%$ .
Evolved specialist collectives: can defeat $\approx 96.3\%$ .
Single evolved specialist: Defeats $\approx 28\%$ on average.

B. Dynamics of Continual DRQ

Across 96 runs of multi-round DRQ:

Generality of warriors increases monotonically with round index ( $t$ ), with $p \ll 0.001$ under log-linear fit.
Across-run phenotypic variance and phenotype-change rates decrease as DRQ progresses, while genotypic variance remains roughly stable.
This pattern suggests convergent evolution toward generalist strategies—distinct underlying codes, but increasingly similar phenotypic profiles.

C. Opponent History Length

$K=1$ (last opponent): Many dominance cycles observed.
$K > 1$ or full history: Reduces cycles by $77\%$ ; arms race stabilizes.

D. Quality-Diversity Role

Replacing MAP-Elites with a greedy single-cell approach degrades champion quality, especially in later rounds. This highlights the necessity of intra-round exploration and diversity preservation in adversarial evolution.

E. Code-Generality Predictiveness

Linear regression from text embeddings to final generality achieves $R^2 \approx 0.46$ , evidencing nontrivial structure in the code underlying robustness and offering a foothold for future surrogate modeling and interpretability.

6. Broader Implications and Limitations

The DRQ approach positions Core War as a controllable, Turing-complete arena for adversarial program evolution. The algorithm demonstrates that even minimal self-play—sequential LLM-based generation, MAP-Elites-based search, and historical evaluation—yields robust generalist solutions. Advantages include clear parallels to cybersecurity arms races, safe sandboxed experimentation, and insight into open-ended adaptation (Kumar et al., 6 Jan 2026).

Several limitations are acknowledged:

The linear lineage (one champion per round) does not fully recapitulate the diversity and concurrency of complex ecosystems.
Computational expense may constrain scalability; surrogate-based fitness approximation may be required.
Behavioral descriptor and core parameter choices may bias evolutionary search; domain-agnostic alternatives remain an open question.

Ultimately, Core War’s integration with modern LLM-driven and quality-diversity techniques illustrates the transition from brittle specialization under static objectives to robust generalism via continual adaptation, and suggests that similar minimal Red Queen dynamics may have application in cybersecurity defense, red-teaming, and even models of biological resistance.

Markdown Upgrade to Chat

References (1)

Digital Red Queen: Adversarial Program Evolution in Core War with LLMs (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Core War.

Core War: Adversarial Program Evolution

1. Formalizing the Adversarial Objective in Core War

2. Algorithmic Structure: The Digital Red Queen Self-Play Loop

3. Warrior Representation and LLM Mutation

4. Experimental Protocol and Evaluation Metrics

5. Empirical Observations and Analysis

A. Static vs. Red Queen Baselines

B. Dynamics of Continual DRQ

C. Opponent History Length

D. Quality-Diversity Role

E. Code-Generality Predictiveness

6. Broader Implications and Limitations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Core War: Adversarial Program Evolution

1. Formalizing the Adversarial Objective in Core War

2. Algorithmic Structure: The Digital Red Queen Self-Play Loop

3. Warrior Representation and LLM Mutation

4. Experimental Protocol and Evaluation Metrics

5. Empirical Observations and Analysis

A. Static vs. Red Queen Baselines

B. Dynamics of Continual DRQ

C. Opponent History Length

D. Quality-Diversity Role

E. Code-Generality Predictiveness

6. Broader Implications and Limitations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research