Papers
Topics
Authors
Recent
Search
2000 character limit reached

Othello-GPT: Transformer for Othello

Updated 27 January 2026
  • Othello-GPT is an autoregressive transformer model that predicts subsequent moves in Othello using raw game transcripts and specialized board tokenization.
  • It employs multimodal extensions and causal interventions to induce interpretable internal world models and robust strategic move generation.
  • The models achieve high legal-move prediction accuracy across synthetic and championship data, with promising extensions like self-play and RL for enhanced gameplay.

Othello-GPT is a family of autoregressive transformer-based models that learn to play Othello (Reversi) by next-move prediction from raw game transcripts. These models, typically trained on large corpora of human or synthetic move sequences, can induce interpretable internal world models, robust next-move generation, and strategic decision-making. Research spanning 2022–2025 has produced a rich understanding of Othello-GPT’s architecture, training procedures, mechanistic interpretability, causal structure, and multimodal extensions.

1. Input Encoding and Model Architecture

Othello-GPT models employ tokenizations specialized for the 8×8 Othello board. Each move is encoded as a positional token: either ASCII algebraic notation or an integer in [1,64][1,64] (excluding the four fixed central squares in some protocols) (Noever et al., 2022, Li et al., 2022, Hazineh et al., 2023, Yuan et al., 6 Mar 2025). Input sequences reflect past moves only; the board state is reconstructed implicitly. Transformer variants use:

  • Decoder-only GPT architectures, commonly 6–8 layers, 8 heads/layer, dmodel=128d_{\text{model}} = 128–512, vocabulary V=60V = 60–66 (board positions and special tokens).
  • Token embeddings EE and positional encodings PP added per timestep: et=Ext+Pte_t=E\,x_t + P_t.
  • No explicit color tokens—player turns alternate deterministically.

For multimodal extensions, VISOTHELLO fuses BERT-style move encoders and ResNet-18 board-image features, concatenated via [SEP] tokens and processed jointly by a transformer (Chen et al., 19 Jul 2025).

2. Training Objectives, Data, and Move Generation

Othello-GPT models are trained solely by autoregressive next-token cross-entropy: L=1Tt=1Tlogp(xtx<t)\mathcal{L} = -\frac{1}{T}\sum_{t=1}^T \log p(x_t|x_{<t}) Optimizers are Adam/AdamW; typical learning rates are 1×1051\times10^{-5}1×1041\times10^{-4} (Du et al., 13 Jan 2025).

Corpora include:

Generation employs greedy, top-kk, or high-temperature sampling to trade off diversity vs. legality (Noever et al., 2022). Results are decoded until the end-of-sequence token or until an illegal move arises.

3. World Model Induction and Representation Analysis

Othello-GPT models develop layered internal representations of board dynamics. Key findings:

  • Linear probes: Applied to each residual stream, these recover the full board state (“Mine/Yours/Empty” classification for all 64 tiles) with accuracy rising from \sim91% (layer 1) to >>99% (layer 6+) (Hazineh et al., 2023).
  • Nonlinear probes: Two-layer MLPs reduce error rates from \sim20% to <<2% on synthetic-trained models (Li et al., 2022).
  • Sparse Autoencoders (SAE): Yield robust, disentangled basis features representing static (edges, corners) and dynamic (tile flips, stability) patterns. SAE-derived features reliably detect tile stability in layers 2–4, revealing emergent planning capacity (Du et al., 13 Jan 2025).

Multimodal models (VISOTHELLO) show stronger internal world modeling; F1_1 scores for tile-state linear probes rise from 62 (text-only) to 78 (multimodal) (Chen et al., 19 Jul 2025).

4. Causal Structure and Mechanistic Interpretability

  • Attention as causal SCM: Masked attention heads encode causal influence graphs over the move sequence. Normalized attention matrices correspond to total-effect matrices (IG)1(I-G)^{-1} in linear SCMs (Rohekar et al., 2024).
  • Confidence scoring R(A)R(A): The entropy gap between conditional independence and dependence p-values in attention-based causal graphs strongly predicts model move legality; low RR correlates with illegal outputs (Rohekar et al., 2024).
  • Causal interventions: Clamping latent states to counterfactual board representations steers next-move logits, confirming that mid-layer world models causally drive prediction (Hazineh et al., 2023).
  • Patch-free circuit discovery: Sparse dictionary learning and top-down attribution (OV/QK/ADC formulas) decompose high-level features into interpretable circuits tracing board detectors, attention patterns, and flip detectors down to base tokens (He et al., 2024).

5. Performance, Generalization, and Robustness

Othello-GPT models achieve high legal-move accuracy:

  • GPT-2, T5, BART, and 7B-scale LLaMA-2/Mistral/Qwen2.5 reach >>99.9% next-move correctness on synthetic data and championship records (Yuan et al., 6 Mar 2025).
  • Two-hop prediction (planning two moves ahead) is more difficult: best models achieve 94–97% accuracy.
  • GPT-2 fine-tuned models achieve up to 71% completion (43/60 legal moves/game) before illegal placement; GPT-3 few-shot models reach 41% (Noever et al., 2022).

Multi-modal VISOTHELLO outperforms text-only and vision-only baselines in next-move accuracy, probing, and win-rate. Visual context increases sample efficiency and robustness to perturbations—rotated input images yield only 1–2% accuracy drop, compared to 50–60% for ResNet-only models (Chen et al., 19 Jul 2025).

6. Inductive Biases, Interpretability, and Circuit Properties

  • Hierarchical feature emergence: Early layers encode static geometry (edges, corners), intermediate layers encode tile stability and dynamic control, final layers sharpen combinatorial patterns (Du et al., 13 Jan 2025).
  • Circuit-level interpretability: Sparse dictionary features correspond to specific game concepts (move position, opponent proximity, legal-move detector). Patch-free attribution enables systematic decomposition of OV, QK, and MLP circuits, reducing out-of-distribution risks associated with activation-patching (He et al., 2024).
  • World model grounding: All leading transformer architectures induce similar internal representations and inter-model alignment exceeding 90% feature similarity, even across decoder-only and encoder-decoder models (Yuan et al., 6 Mar 2025).

7. Extensions, Limitations, and Future Directions

  • Practical augmentations: Proposals include explicit board-state tokens, legality-checking head modules, causal-head regularization, expanded context windows, and R(A)-based inference filtering (Rohekar et al., 2024).
  • Self-play, RL, and natural-language commentary are suggested extensions to improve Othello-GPT’s completion and strategic diversity (Noever et al., 2022).
  • Mechanistic open questions: Optimal sparsity schedules for dictionary learning; clustering discovered features; mapping contributors to mispredictions; extending circuit analysis to larger, natural-text LLMs (He et al., 2024, Du et al., 13 Jan 2025).
  • Multimodal integration: VISOTHELLO demonstrates the value of vision-text fusion in symbol grounding and robust world modeling (Chen et al., 19 Jul 2025).

In total, research on Othello-GPT establishes a canonical testbed for transformer world-model induction, mechanistic interpretability, and causal analysis. Its results strongly support the Othello World Model Hypothesis: next-token transformers trained on game transcripts develop structured, manipulable, and generalizable internal representations of complex environments.

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Othello-GPT.