Vox Deorum: Hybrid AI for Strategy Games

Updated 26 December 2025

Vox Deorum is a hybrid AI architecture that marries LLM strategic planning with algorithmic tactical modules for competitive 4X gameplay.
The system employs a two-layered pipeline, separating macro-strategic reasoning from precise tactical execution via structured prompts and RESTful APIs.
Evaluation in Civilization V demonstrates comparable win rates and diverse play styles, highlighting robust performance and practical cost-latency trade-offs.

Vox Deorum is a hybrid AI architecture for 4X and grand strategy games that combines the macro-strategic planning capabilities of LLMs with the tactical execution proficiency of established algorithmic AI modules. Developed and validated on Sid Meier’s Civilization V with the Vox Populi mod, Vox Deorum demonstrates that open-source LLMs can provide competitive, human-like strategic oversight within commercial game environments, while tactical and operational details are delegated to domain-proven subsystems. Its design, evaluation, and play style implications exemplify a concrete, scalable blueprint for integrating natural language reasoning with agentic game AI (Chen et al., 21 Dec 2025).

1. Layered System Architecture

Vox Deorum is structured as a two-layered pipeline:

Macro-Strategic Layer (LLM Strategist): This module receives a turn-wise, structured Markdown summary of the full game state as input. It outputs a decision tuple $d_t$ , with fields specifying the high-level strategy, economic and military focus, technology and policy paths, and persona modifiers. Formally, the LLM implements a policy $\pi_{\mathrm{LLM}} : \mathcal{S} \rightarrow \mathcal{D}$ , where $\mathcal{S}$ is the space of structured states and $\mathcal{D}$ the space of decisions. Each prompt is crafted to elicit one choice per decision dimension, enforcing a disciplined interface:

$d_t = \bigl(g_t,\,e_t,\,m_t,\,\tau_t,\,\pi_t,\,\rho_t\bigr)$

where $g_t$ ∈ {Culture, Spaceship, Conquest, ...}; $e_t$ ∈ EconomicStrategies; $m_t$ ∈ MilitaryStrategies; $\tau_t$ ∈ TechOptions; $\pi_t$ ∈ PolicyBranches; $\rho_t$ ∈ PersonaModifiers.
Tactical Execution Layer (“X”): This layer comprises the Vox Populi algorithmic search routines (VPAI), which receive the LLM’s directives and execute granular city build orders, unit movements, combat targeting, and diplomacy. The interface uses a RESTful tool API (over Windows Named Pipes) with JSON calls such as set-grand-strategy(g), set-economic-strategy(e), etc.

Turn-based Pseudocode:

while not game_over:
    state = VPAI.get_structured_state()
    prompt = format_markdown(state)
    decision = LLM.call_model(prompt) # returns d_t
    for each component in decision:
        VPAI.set_strategy(component)
    VPAI.execute_tactical_phase() # moves units, builds cities, etc.

2. Macro-Strategic Reasoning and Prompting

Vox Deorum eschews LLM fine-tuning in favor of hand-crafted prompts, leveraging zero- or few-shot in-context learning. The prompt template details the LLM’s expert role, available tools, descriptions of strategic options, and a minimal feedback buffer recording the previous turn’s rationale. Decision-making is based on full-state, structured summaries including sections such as VictoryProgress, StrategicOptions, PlayerSummaries, CitySummaries, MilitarySummaries, and Events. This approach aims to maximize a long-horizon utility:

$U(\pi_{\mathrm{LLM}}) = \mathbb{E}\left[\mathrm{score}_{\mathrm{end}} \mid \pi_{\mathrm{LLM}}, X\right]$

where $X$ denotes the fixed tactical policy. Optimization relies on prompt design that discourages “wishful thinking,” encourages timely pivots, and requires explicit opponent reasoning.

3. Tactical Execution Subsystems

The tactical (“X”) component in Vox Deorum utilizes VPAI’s algorithmic modules: depth-limited DFS or A* routines for micro-level management of cities, units, and diplomacy. Communication occurs via RESTful MCP over Named Pipes, using JSON commands specifying which strategic weights or “flavors” to set. The study did not include reinforcement learning (RL)-based tactical modules, though the architecture is explicitly compatible with RL micro-AI alternatives. This separation enables robust execution of high-level plans specified by the LLM while leveraging the reliability and efficiency of traditional tactical engines.

4. Large-Scale Evaluation and Metrics

Vox Deorum was empirically validated in 2,327 full-length Civilization V games on the Communitas₇₉ₐ map, tiny size, four players, Prince difficulty. Three experimental conditions were considered:

Condition (Strategist)	Number of Games
VPAI Baseline	919
GPT-OSS-120B	983
GLM-4.6	425

Key evaluation metrics included win rate, score ratio (Player 0’s final score divided by max score among all players), victory-type distribution, time spent on each grand strategy, strategic pivot rates, policy trajectories, and token-level cost/latency. Fixed-effects regression (with controls for civilization identity) and logistic and OLS models were used for analysis.

Performance Summary:

LLM	ΔWin Rate	ΔScore Ratio
OSS-120B	−2.65 % (p=0.182)	+1.3 % (p=0.373)
GLM-4.6	−1.61 % (p=0.534)	+1.8 % (p=0.333)

All observed differences relative to the algorithmic baseline were statistically non-significant, indicating that LLM strategists matched algorithmic AI in end-to-end play (Chen et al., 21 Dec 2025).

5. Comparative Play Style Analysis

Despite comparable overall strength, LLM strategists exhibited distinctive play styles compared to conventional algorithmic AI and to each other:

Victory Preference Skew: OSS-120B displayed a pronounced Domination focus (winning 31.5% more often by Domination, $p<0.001$ ) and succeeded less frequently via Cultural victory (−23.2%). GLM-4.6 showed a milder Domination bias (+7.1%) with a modest reduction in Cultural wins (−9.7%, $p=0.065$ ).
Grand-Strategy Adoption: OSS-120B allocated ~80% of its turns to Domination, the remainder to Science/Culture/Diplomatic. GLM-4.6 distributed turns more evenly (~30% Culture, ~25% Domination). VPAI held a more balanced mix.
Strategic Pivots: The frequency of strategic changes per 100 survival turns was significantly lower for LLMs (OSS-120B: 34.0; GLM-4.6: 13.9) versus VPAI (51.6), reflecting greater “stubbornness.”
Policy Trajectories: Both LLMs followed non-linear policy branch adoption paths, with a consistent preference for Order ideology over Freedom by 23–24%.

These divergent behaviors suggest that language-based strategic reasoning yields agentic and non-deterministic play—potentially enhancing the human-like qualities of AI opponents.

6. Implementation Considerations and Limitations

Key empirical observations for real-world adoption include:

Latency and Cost: One LLM call per turn (OSS-120B) incurs ~20M input and 0.55M output tokens/game (~$0.86/game) and an average 15s latency—aligning with practical multiplayer timer constraints.
Context Window Growth: The input token count scales quadratically with game progression, driving the need for memory modules or summarization strategies to control prompt size in late game.
Reasoning Gaps: Text-only state encoding introduced challenges in spatial awareness, leading to misinterpretation of map chokepoints or “phony wars.”
Model Scale: The larger GLM-4.6 model did not yield a measurable performance advantage over OSS-120B, indicating that prompt engineering and modular partitioning are more consequential than parameter count.

A plausible implication is that future systems will need to address input encoding, memory bottlenecks, and integrate specialized modules for spatial and adversarial reasoning.

7. Future Research Directions

Vox Deorum’s blueprint enables several avenues for advancing agentic AI in strategy games:

RL-augmented Tactical Execution: Replacing or augmenting algorithmic “X” with RL-trained micro-AI to achieve fully steerable hybrid frameworks.
Retrieval-Augmented or Summarized State Representations: Mitigating prompt length growth via selective summarization or external memory modules.
Multimodal Inputs: Improving spatial reasoning by supplying minimaps or screenshots alongside text-based summaries.
Negotiation and Diplomacy Modules: Allowing LLMs to craft custom treaty or trade texts for richer AI-human interaction.
Multi-agent LLM Pipelines: Combining lightweight models for abstraction with larger models for final decision-making to further optimize computation.

By compartmentalizing strategic and tactical decision-making within a simple pipelined architecture, Vox Deorum establishes a practical path for the integration of LLMs in commercial 4X titles, enabling both competitive performance and diverse, plausible play styles (Chen et al., 21 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Vox Deorum: A Hybrid LLM Architecture for 4X / Grand Strategy Game AI -- Lessons from Civilization V (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Vox Deorum.

Vox Deorum: Hybrid AI for Strategy Games

1. Layered System Architecture

2. Macro-Strategic Reasoning and Prompting

3. Tactical Execution Subsystems

4. Large-Scale Evaluation and Metrics

5. Comparative Play Style Analysis

6. Implementation Considerations and Limitations

7. Future Research Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Vox Deorum: Hybrid AI for Strategy Games

1. Layered System Architecture

2. Macro-Strategic Reasoning and Prompting

3. Tactical Execution Subsystems

4. Large-Scale Evaluation and Metrics

5. Comparative Play Style Analysis

6. Implementation Considerations and Limitations

7. Future Research Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research