Turn-Based Pokémon Battle System

Updated 24 December 2025

The turn-based Pokémon battle system is a discrete-time, adversarial framework integrating simultaneous decision-making, partial observability, and stochastic outcomes.
It formalizes battle state with multi-layer tuples that include team composition, active Pokémon, and dynamic field conditions to guide move resolution.
Applied in AI and game theory research, this system benchmarks strategic planning, reinforcement learning, and heuristic evaluations in competitive environments.

A turn-based Pokémon battle system is a discrete-time, adversarial, multi-agent framework characterized by simultaneous player actions, partial observability, stochasticity, and intricate resource management. It underlies both human and artificial agent competition in the Pokémon game franchise and serves as a testbed for methods from game theory, algorithms, and statistical machine learning. Contemporary research formalizes such systems as extensive-form games with perfect recall and multi-layered information sets, integrating canonical mechanics (damage formulas, move resolutions) and higher-level reasoning over uncertainty, strategy, and team composition (Sarantinos, 2022, Karten et al., 6 Mar 2025, Hu et al., 2 Feb 2024, Jain et al., 19 Dec 2025).

1. Battle State and Formal System Representation

The canonical state at turn $t$ is a tuple:

$S_t = \langle \text{Team}_A, \text{Team}_B, \text{Active}_A, \text{Active}_B, \text{Conditions} \rangle$

where each team is an ordered collection of up to six Pokémon, each with fixed and dynamic attributes:

Fixed (Species-specific): species, level, types (1 or 2/18), base stats (HP, Atk, Def, SpA, SpD, Spe), ability, eligible form changes (e.g., Tera).
Dynamic: current HP, stat modifiers ( $-6\ldots+6$ ), status (burn, sleep, etc.), volatile state (confusion, flinch), move PP, and field-dependent effects (weather, terrain, screens, entry hazards).

Game state also encodes:

Active Pokémon indices for each player.
Field and side conditions: weather, terrain, entry hazards, speed modifiers, battle timers.

Observations for the decision process are partial: each agent fully observes their own team and public field state, but only the active Pokémon and revealed properties of the opponent (Karten et al., 6 Mar 2025, Sarantinos, 2022, Hu et al., 2 Feb 2024).

2. Turn Cycle, Action Space, and Resolution Protocol

At the start of each turn, both agents simultaneously select an action from their available action set:

Attack: Use one of four moves, subject to remaining PP.
Switch: Change the active Pokémon to another non-fainted team member.
(In PvE or early-game scenarios: use an item or attempt to flee.)

Action space cardinality is typically $\leq 4 + 5 = 9$ (Gen 9 OU adds Terastallize for $+1$ ) (Karten et al., 6 Mar 2025). Once both actions are received, execution unfolds:

Switches resolve before attacks.
Move priority determines order among non-switching actions; higher value executes first.
Speed tiebreaker: if priorities are equal, compare effective speeds; resolve ties randomly.
Resolution: Execute the first action (damage, status, stat changes); if it causes a faint, enforce a forced switch; execute the second action if still valid; apply end-of-turn effects.

Each action's execution depends on outcomes drawn from stochastic elements such as move accuracy and secondary effects (Sarantinos, 2022, Yashwanth et al., 3 Aug 2025, Hu et al., 2 Feb 2024).

3. Damage Calculation and Status Resolution

The central mechanic for direct conflict is the canonical Pokémon damage function:

$\text{Damage} = \Biggl\lfloor \Biggl\lfloor \frac{2L}{5} + 2 \Biggr\rfloor \frac{A \cdot P}{D} \Biggr\rfloor / 50 + 2 \Biggr\rfloor \cdot \text{STAB} \cdot \text{TypeEff} \cdot \text{Crit} \cdot R$

with:

$L$ : attacker level
$A$ , $D$ : effective attacking and defending stat, including stages, abilities, items, and field
$P$ : move base power
Multipliers:
- STAB: $1.5$ if attacker's type matches move, $1.0$ otherwise
- TypeEff: product of type matchups, zero if immune
- Crit: $1.5$ if critical, $1.0$ else
- $R$ : uniform $[0.85,1.00]$ for damage variance
- Additional: weather, abilities, burn reduction, item effects

Status moves probabilistically inflict effects (paralysis, sleep, etc.) per move's accuracy and effect chance. At each turn's conclusion, residual effects (toxics, burns, weather) and side conditions update HP and alter stat durations (Jain et al., 19 Dec 2025, Sarantinos, 2022).

4. Strategic Uncertainty and Information Structure

Battle dynamics are governed by three orthogonal uncertainties (Sarantinos, 2022):

Simultaneity: Each player selects moves without knowledge of the other's action, introducing combinatorial complexity akin to simultaneous-move games. Mixed-strategy predictions may optimally exploit payoff matrices and regret-minimization.
Hidden information: Opponent's unrevealed team makeup, items, and subtleties of stat allocation are partially observable. Agents infer missing information using meta-statistics or combinatorial proxy optimization (e.g., mixed-integer programming for plausible opposing rosters).
Stochasticity: Move hits, secondary effects, and critical hits follow stochastic rules, with explicit Monte Carlo expansion frequently intractable. Approximation strategies (bucketed sampling per turn, expectation propagation) dramatically reduce the computational branching factor without significant accuracy loss.

LLM-based agents further utilize prior distributions over likely opponent team attributes, leveraging large-scale datasets (3 million+ matches) to inform practical opponent modeling (Karten et al., 6 Mar 2025).

5. Team Management, Switching, and Macro-Strategy

Optimal play depends not only on turn-level tactics but on maintaining a balanced, synergistic team across the battle horizon. Key principles include:

Coverage and Resistances: Construct teams that collectively threaten common metagame archetypes while minimizing shared weaknesses.
Bulk vs. Offensive Pressure: Mix sweepers/walls to regulate both immediate offense and attrition, responding adaptively to opponent structure.
Entry Hazard Control: Stack and remove multi-turn, chip-damage hazards (Stealth Rock, Spikes) and anti-hazard moves.
Sacrifice and Matching: Intentionally faint or switch Pokémon to secure crucial matchups in future turns.
Look-ahead Evaluation: Evaluate states not just by current HP totals, but by projected matchup charts: score each remaining pairwise interaction and aggregate for a holistic expected value of team strength (Sarantinos, 2022, Jain et al., 19 Dec 2025).

LLM battle agents encode heuristic scoring functions for legal actions, balancing expected damage, type advantage, HP preservation, and risk penalties to rank candidate moves and switches. Empirically, specific LLM architectures manifest distinct playstyles (e.g., aggressive vs. cautious) and can even exhibit multi-turn planning (Liu et al., 30 Jun 2025, Yashwanth et al., 3 Aug 2025, Jain et al., 19 Dec 2025).

6. AI Benchmarks, Evaluation, and Empirical Results

Agent performance is measured using rigorous benchmarks:

Win Rate (vs. fixed or human opponents in randomized or constructed teams)
Turn-count statistics: average turns to win/loss, indicative of aggressiveness or stalling
Type-alignment accuracy: frequency of super-effective move selection when available
Action prediction accuracy: top- $k$ match rate with human expert play
Validity and balance statistics (for move generation in content-creation modules)
Elo rating projections: direct ladder performance against human players and bots

Recent LLM-augmented agents achieve 76% to 84% win rates against prior SOTA bots and project $\approx$ 1300–1500 Elo ratings online, corresponding to the upper 10–30% of human competitors (Karten et al., 6 Mar 2025, Liu et al., 30 Jun 2025). Human-parity LLM agents leverage in-context reinforcement learning and knowledge-augmented representations to refine policy and consistency (Hu et al., 2 Feb 2024). Empirical findings reveal strong correlation between general LLM capabilities and in-game strategic performance (Liu et al., 30 Jun 2025).

Agent	Win Rate vs. SOTA	Projected Elo	Notable Mechanism
PokéChamp GPT-4o	76% (LLM bot)	1300–1500	LLM-powered minimax search
PokéAI DeepSeek	80.8% (PvE)	Not reported	Heuristic LLM scoring
PokéLLMon GPT-4o	≈49% (ladder)	≈1020 (Gen 9 OU)	In-context RL + prompt Pokedex

7. Extended Applications: Content Generation and Multi-Agent Evaluation

Turn-based Pokémon battle systems provide a foundation for advanced game content generation and comparative agent analysis. LLM frameworks have demonstrated high validity in generating novel but mechanically balanced moves, enforcing canonical trade-offs (power–accuracy–PP) and type consistency. Adaptive tournament systems, such as the LLM Pokémon League, support meta-strategy tracking and robust benchmarking for strategic reasoning (Jain et al., 19 Dec 2025, Yashwanth et al., 3 Aug 2025).

Battle decision logs—storing JSON action schemas and natural-language rationales—facilitate post hoc analysis of adaptability, tactical depth, and agent-specific behavioral patterns, fostering both competitive and reinforcement learning research.

In summary, the turn-based Pokémon battle system is a formally rigorous, partially observable, stochastic, and complex multi-agent environment. Its precise mechanics and competitive depth have catalyzed progress in AI, multi-agent reasoning, and LLM alignment research, serving as both a strategy benchmark and a laboratory for emergent behavioral analysis (Sarantinos, 2022, Jain et al., 19 Dec 2025, Karten et al., 6 Mar 2025, Hu et al., 2 Feb 2024, Liu et al., 30 Jun 2025, Yashwanth et al., 3 Aug 2025).