MDA Reasoning Framework
- MDA Reasoning is a framework that maps static game rules (Mechanics) to emergent interactions (Dynamics) and subjective experiences (Aesthetics).
- It operationalizes a three-stage latent chain-of-thought in models like MeepleLM by processing detailed rulebooks and persona profiles to simulate authentic player critiques.
- The approach leverages extensive datasets with rigorous annotation to achieve improved evaluation metrics and enhanced fidelity in game design feedback.
Mechanics–Dynamics–Aesthetics (MDA) Reasoning formalizes the causal relationship between game rule structures, emergent system behaviors, and subjective player experience. MDA reasoning enables structured critique and simulation of interactive systems—principally board games—by operationalizing a three-stage latent reasoning chain within LLMs. This approach is exemplified by MeepleLM, which processes rulebooks and persona profiles into experience-grounded assessments using multi-stage inference and explicitly annotated data, advancing the automation and fidelity of game critique and virtual playtesting (Li et al., 12 Jan 2026).
1. Formalization of Mechanics–Dynamics–Aesthetics (MDA) Reasoning
MDA comprises:
- Mechanics (M): The static rule components—such as cards, tokens, permitted actions, and state transitions—precisely specified in the structured rulebook.
- Dynamics (D): The emergent and often unpredictable behaviors that arise from the runtime interaction of mechanics (e.g., standoffs, pacing shifts, risk escalations).
- Aesthetics (A): The resultant player experience, covering emotional, psychological, and experiential facets (e.g., tension, camaraderie, frustration, flow).
MeepleLM treats MDA as a three-stage latent reasoning chain
bridging from the rulebook and persona profile to a critique output . This decomposition instantiates the causal mapping
where the latent steps reflect sequential inference: determining mechanics, modeling emergent dynamics, and mapping these to aesthetic, persona-specific experiences. During training, the latent sum collapses to an explicit Chain-of-Thought (CoT) sequence.
2. Dataset Construction and MDA Annotation
The development of MDA reasoning relies on paired structured rulebooks and high-fidelity critiques:
- Rulebooks: 1,727 board game rulebooks, normalized and structurally annotated.
- Reviews: ≈150,000 high-quality human-authored reviews, selected via MDA-alignment scoring and stratified sampling.
Review Filtering and MDA Scoring. An initial pool of 1.8 million reviews is filtered using Qwen3-235B to remove irrelevancies and noise. Each review is scored according to:
- Mechanism anchoring (linkage to mechanics),
- Explicit causal attribution (clear Mechanics→Dynamics→Aesthetics reasoning),
- Constructiveness (utility for design feedback).
High-quality reviews are further annotated along key facets (e.g., "Luck vs. Strategy," "Pacing & Flow"), achieving a Pearson with original ratings.
Persona Discovery and Labeling. Reviews are embedded as composite strings: Clustered via K-means () and distilled into five archetypes: The System Purist, The Efficiency Essentialist, The Narrative Architect, The Social Lubricator, and The Thrill Seeker. Labeling is performed by a three-way GPT-5.1 voting process.
Annotation Guidelines. Each retained review receives structured metadata in the JSON schema: 0
3. Model Architecture and Learning Objective
MeepleLM initializes from the Qwen3-8B backbone and applies Persona-Conditional Instruction Tuning using LoRA on all linear layers. The system is trained to intenalize both explicit and latent reasoning over rulebooks and personas.
- Chain-of-Thought Distillation: For each rulebook–review pair, a teacher (Qwen3-235B) reconstructs the latent MDA chain . Verification and filtering use GPT-5.1 to eliminate hallucinated or sentiment-mismatched instances.
- Persona Conditioning: The entire persona semantic profile—core motivations, preferred/disliked mechanics, interaction style—is used as the system instruction, controlling downstream phrasing and evaluation.
- Objective: The model minimizes cross-entropy over the concatenated CoT chain and final critique: where 0 includes both the explicit MDA reasoning and persona-specific output. "Slow Thinking" forces gradient flow through 1 tokens.
- LoRA Configuration:
- Rank 2, 3, dropout 4
- Learning rate 5, three epochs, batch size two (with accumulation to 128)
4. Inference Procedure and Critique Generation
At inference, MeepleLM is presented with a system prompt embedding the target persona profile and a user prompt carrying the plain-structured rulebook. The model internally executes the MDA reasoning chain before emitting a structured JSON response: 1 Persona sampling is conducted to match empirical frequencies from the test corpus. For example, when simulating "The Social Lubricator" persona for a rulebook excerpt from "One Night Ultimate Werewolf," MeepleLM produces reviews capturing authentic perspective nuances—e.g., focusing on group interaction, ease-of-teaching, and social tension.
Notably, GPT-5.1, in contrast, yields critiques lacking nuanced persona alignment, illustrating the specificity imparted by explicit MDA reasoning.
5. Evaluation Metrics and Comparative Results
Evaluation comprises three axes: macro-level preference alignment, micro-level critique fidelity, and practical utility.
A. Macro-level Preference Alignment:
- Mean Absolute Error (MAE)
- Wasserstein Distance (WD): divergence of predicted vs. ground-truth score distributions
- Kendall's 6: rank correlation for ordinal preferences
B. Micro-level Fidelity:
- Factual correctness (RuleAccuracy): measures rule-derived factuality as 7
- Dist-2: lexical diversity index (higher is preferable)
- Diversity (Div.): 1–5 scale for perspective variety in output
C. Practical Utility:
- Opinion Recovery Rate (OpRec): 8
- Blind A/B user studies comparing authenticity and usefulness on both familiar and unfamiliar games.
Major results:
| Model | MAE ↓ | WD ↓ | τ ↑ | Fact. ↑ | Dist-2 ↑ | Div. ↑ | OpRec ↑ |
|---|---|---|---|---|---|---|---|
| GPT-5.1 | 0.99 | 0.95 | 0.256 | 99.5% | 0.69 | 4.26 | 63.4% |
| Gemini3-Pro | 1.43 | 0.51 | 0.246 | 98.3% | 0.65 | 3.98 | 57.7% |
| Qwen3-235B | 1.23 | 0.64 | 0.145 | 98.9% | 0.66 | 3.56 | 54.3% |
| Qwen3-8B | 0.89 | 1.01 | 0.049 | 97.9% | 0.59 | 1.58 | 11.4% |
| MeepleLM | 0.66 | 0.22 | 0.282 | 98.9% | 0.71 | 4.34 | 69.8% |
MeepleLM achieves a 34% lower MAE and 77% lower WD relative to GPT-5.1, and improves rank fidelity (9) to 0.282. In blind A/B testing, it obtains a 70% average win-rate (83% on “Familiar” games by authenticity).
Ablation studies confirm that removing any of{Rulebook, Persona, MDA Chain} sharply degrades performance, especially for subjectively oriented personas.
6. Impact, Implications, and Future Directions
MDA reasoning as structured in MeepleLM demonstrates that explicit causal chaining—from mechanics through emergent dynamics to subjective aesthetics—offers measurable improvements in both alignment with human judgement and utility to game designers (Li et al., 12 Jan 2026). Grounding model critique within a carefully filtered corpus mapped to persona archetypes enables simulation of diverse, subjective player experiences, facilitating audience-aligned, experience-aware critiques.
A plausible implication is that such explicit reasoning decompositions can generalize to other interactive systems requiring evaluation under subjective heterogeneity. Further development may focus on extending persona expressivity, dynamic adaptation of reasoning chains, and integration beyond board games to applications in general interactive system design.