Papers
Topics
Authors
Recent
Search
2000 character limit reached

PERIL Platform Overview

Updated 30 March 2026
  • PERIL is an open-source platform that operationalizes persona-mediated decision-making in a Risk-inspired, turn-based game environment using strategic heuristics.
  • The platform features detailed game mechanics, modular architecture, and interpretable heuristic scoring to correlate persona traits with performance.
  • It employs both direct heuristic generation and personality inventory mediation to convert natural language persona traits into actionable LLM strategies.

PERIL is an open-source research platform for investigating persona-mediated decision-making by LLMs in a strategic adversarial setting. Designed by Licato et al., PERIL operationalizes persona prompting in the context of a turn-based, world-conquest board game inspired by Risk®, featuring interpretable behavioral heuristics and a reproducible evaluation framework for correlating persona traits with strategic performance (Licato et al., 7 Dec 2025).

1. Game Structure and Mechanics

PERIL simulates a stochastic, turn-based environment in which up to six players compete for total domination of a connected map comprising regions (nodes) grouped into zones (continents). Each player controls armies (units), which are distributed over regions according to the following phases each turn:

  • Initialization / Deployment Phase: Players alternate placing one army on empty regions until the map is filled, then allocate remaining reinforcements.
  • Reinforcement Phase: Each player receives new units as a function of the number of owned regions and zones (zone bonus).
  • Attack Phase: Players may launch attacks from controlled regions to adjacent enemy regions. Legal moves are enumerated; each is scored via a heuristic-weighted sum, and a single move is selected by weighted randomization. Attacking continues until voluntarily passing, determined by a phase-specific PASS heuristic.
  • Redeployment Phase: Players can move units between owned, connected regions. As with attacks, moves are scored and selected probabilistically via heuristics, with a PASS heuristic defining voluntary phase end.
  • Combat: Each attack is resolved by dice rolls, with probabilistic outcomes for conquest and losses.

Games terminate upon world domination or after 250 turns, the latter resulting in a draw.

2. Software Architecture and Agent Interface

PERIL is implemented in Python and utilizes Dash Cytoscape for visualizing game states, supporting both human (via GUI interaction) and AI players (direct interfacing bypassing GUI). The modular platform comprises:

  • Game Engine: Maintains the persistent game state, manages turn order, phases, combat, and win conditions.
  • Heuristic Engine: Stores the player-specific set of heuristics and their weights.
  • Move Enumerator: Lists all legal actions in a given phase and assigns a score based on active heuristics:

score(move)=∑h∈ActiveHeuristicsweight(h)×1[h applies to move]score(move) = \sum_{h \in \text{ActiveHeuristics}} \text{weight}(h) \times 1[h \text{ applies to move}]

  • Action Selector: Samples moves in proportion to their scores using weighted random selection.
  • Tournament Controller: Orchestrates tournament rounds, manages pairing, records outcomes, and computes player skill ratings through the TrueSkill algorithm.

The agent API expects each AI to receive a persona prompt on game start, return a heuristic weight vector w:Heuristics→[0,100]w : \text{Heuristics} \rightarrow [0,100] before any moves, and delegates all in-game both move enumeration and selection entirely to the platform. Source code and extension instructions are available at PERIL GitHub.

3. Heuristic Library and Game Actions

The heuristic inventory comprises 30+ binary heuristics—each acting as a filter over legal moves during a phase. Heuristics are grouped by phase (see Table 1):

Phase/Type Heuristic(s) Brief Description
Initialization PTM/PTL Place unit adjacent to player with Most/Least regions
PUM/PUL Place unit adjacent to player with Most/Least total units
PCM/PCL Place unit adjacent to player with Most/Least zone bonus
ETE/ETN Place unit adjacent/not adjacent to enemy region
Attack ONM/ONL Attack if attacker’s units ≥ / ≤ defender’s units
ON2 Attack if attacker units ≥ 2× defender units
ICD/ICS Attack if (in/not in) different zones
L Attack if joining two owned subgraphs
PASS Probability to end attack phase
Redeploy SI Move if increases invasion success probability
OBTM/OBTL Move adjacent to player with Most/Least regions
CA Move to region adjacent to any enemy
PASS Probability to end redeployment phase

Each AI’s playstyle is governed by its assigned weights, with higher weights amplifying the probability a move filtered by the associated heuristic is selected.

4. Persona Assignment and Mediation

PERIL agents are parameterized by personas sampled from PersonaHub’s synthetic dataset (1 billion entries). A subset of 175,000 personas was annotated using GPT-4 on five axes: strategicThinker, domainExpert, perilSpecific, riskTaker, doOrBe (each in [−1,+1][-1, +1]). A greedy max-variance search over this feature space yields a maximally diverse selection of 50 personas.

Conversion of persona prompts to heuristic weight vectors is realized through two methods:

  • Direct Heuristic Generation (DH): The LLM, given the persona description, full game rules, and heuristic inventory, assigns each heuristic hh a score wDH(h)∈[0,100]w_{DH}(h) \in [0,100] via direct prompting.
  • Personality Inventory Mediation (PI): Inspired by psychometric inventories and exploratory factor analysis, this process consists of:
    • For w:Heuristics→[0,100]w : \text{Heuristics} \rightarrow [0,100]0: w:Heuristics→[0,100]w : \text{Heuristics} \rightarrow [0,100]1
    • For w:Heuristics→[0,100]w : \text{Heuristics} \rightarrow [0,100]2: w:Heuristics→[0,100]w : \text{Heuristics} \rightarrow [0,100]3
    • using w:Heuristics→[0,100]w : \text{Heuristics} \rightarrow [0,100]4.

The PI procedure enforces structured and face-valid personality-to-strategy mappings, and it outperforms DH in generating more distinct, trait-coherent agent behaviors. Opposite-value consistency metrics show that PI yields lower max/min ratios on opposite heuristic pairs for larger LLMs, indicating improved logical coherence in derived weights.

5. Experimental Protocol and Evaluation

Tournaments consist of four runs: two with PI-mediated players and two with DH-mediated players (50 personas per method). Each tournament has 49 rounds, with 25 randomly paired 2-player games per round (1,225 games per run, approximately 4,900 games in total), on a map of 42 regions across 6 zones.

  • Skill Metric: TrueSkill, a Bayesian rating system generalizing ELO for multiplayer, is used to assign skill scores (w:Heuristics→[0,100]w : \text{Heuristics} \rightarrow [0,100]5, initially set to 25, w:Heuristics→[0,100]w : \text{Heuristics} \rightarrow [0,100]6). Game draws (>250 turns) are registered as losses for skill update (<0.05% incidence).
  • Analyses:
    • Feature correlations: Spearman w:Heuristics→[0,100]w : \text{Heuristics} \rightarrow [0,100]7 between TrueSkill and each annotated persona trait. PI method produces strong, statistically significant correlations (up to w:Heuristics→[0,100]w : \text{Heuristics} \rightarrow [0,100]8), where DH yields weak or negligible associations.
    • Cross-run reliability: Spearman correlation of final player rankings between independent tournament runs. DH shows stronger intra-method reliability; PI shows weaker but still positive correlation.
    • Heuristic opposite-value consistency: Ratio of max/min heuristic scores on theoretically opposite pairs is smaller (more consistent) for PI, particularly with larger LLMs.
    • Qualitative validation: Top-performing personas exhibit traits matching human strategic intuition.

6. Mediator’s Role and Psychometric Underpinnings

The PI mediation can be viewed as an application of latent-trait measurement from psychometrics to LLM-generated agents. Analogously to exploratory factor analysis, PI:

  • Treats inventory items as variables,
  • Treats LLM ratings as observed factor scores,
  • Aggregates responses via theoretical item-to-heuristic mappings (akin to factor loadings),
  • Normalizes aggregated scores for comparability.

Although a formal EFA is not conducted, this mapping ensures that each persona generates a balanced, interpretable heuristic profile. This contrasts with the direct, unconstrained mapping (DH), which typically offers weaker and less trait-aligned behavioral differentiation.

7. Implications, Limitations, and Extensibility

The PERIL framework demonstrates that persona prompting can modify LLM-based agent performance in adversarial, stochastic environments, but only with a structured mediation method (PI) that aligns behavioral heuristics with underlying persona traits. Larger LLMs (GPT-4, LLaMA 4) respond more effectively to PI, indicating a scaling effect with model size. The extensible API and open-source release position PERIL as a standard testbed for research into persona-behavior alignment and for studying the translation of natural language personality constructs into operational strategies via LLMs (Licato et al., 7 Dec 2025).

A plausible implication is that careful design of mediation schemes, informed by psychometrics, may be critical for eliciting robust, interpretable behavioral variation in LLM agents. PERIL’s explicit abstraction of LLM outputs into heuristics—rather than direct action selection—enables controlled scientific investigation of the relationship between symbolic persona constructs and observed agent strategy. The domain-specific implementation provides a template for analogous platforms studying other facets of LLM decision-making and alignment.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to PERIL Platform.