Do sequence-trained game models encode formal rules?

Determine whether superhuman-level game-playing foundation models trained purely on move sequences (for example, transformer-based models trained on chess or Othello game records) encode the formal rules of their respective games in their internal representations, rather than achieving high performance solely through statistical pattern-matching that does not reflect the underlying rule structure.

Background

The paper discusses cases where models achieve impressive performance without necessarily capturing underlying mechanisms, highlighting shortcut learning and failures to generalize. In the context of game-playing, models trained only on sequences of moves have been argued to exhibit superhuman play, prompting debate about whether such performance implies an internalization of the game's rules or reliance on heuristics and correlations.

This uncertainty is framed as part of a broader question about whether strong predictive capabilities indicate true mechanistic understanding. References to studies on Othello and chess-trained models suggest mixed evidence, with performance degrading under shifts outside training conditions, reinforcing the need to test for rule-encoding versus pattern-matching.

References

It remains debated, for example, whether superhuman-level game-playing models trained purely on move sequences truly encode the game's rules40-42,47,48 3, or whether models that solve analogies at a human level possess reasoning mechanisms like those of humans49.

— From Prediction to Understanding: Will AI Foundation Models Transform Brain Science? (2509.17280 - Serre et al., 21 Sep 2025) in Main text, section 'Prediction is not Explanation'

Do sequence-trained game models encode formal rules?

Background

References

Related Problems