The Game of Tetris in Machine Learning

Published 5 May 2019 in cs.LG and cs.AI | (1905.01652v2)

Abstract: The game of Tetris is an important benchmark for research in artificial intelligence and machine learning. This paper provides a historical account of the algorithmic developments in Tetris and discusses open challenges. Handcrafted controllers, genetic algorithms, and reinforcement learning have all contributed to good solutions. However, existing solutions fall far short of what can be achieved by expert players playing without time pressure. Further study of the game has the potential to contribute to important areas of research, including feature discovery, autonomous learning of action hierarchies, and sample-efficient reinforcement learning.

Abstract PDF Upgrade to Chat

Authors (2)

Citations (9)

View on Semantic Scholar

Summary

The paper details a spectrum of methods—from dynamic programming to genetic algorithms—that evaluate Tetris using hand-crafted features and linear scoring functions.
It demonstrates that applying dominance rules and strategic feature engineering can drastically reduce the decision space, improving action selection efficiency.
It highlights open challenges in feature discovery, complex scoring functions, and sample-efficient learning, offering insights applicable to broader AI and game research.

This paper, "The Game of Tetris in Machine Learning" (1905.01652), provides a comprehensive overview of the algorithmic approaches applied to the game of Tetris, treating it as a significant benchmark for artificial intelligence and machine learning research. It details the historical progression of methods, highlights the features used, and discusses the performance achieved, while also pointing out open challenges and potential future research directions.

The game of Tetris, created in 1984, is played on a 2D grid where Tetriminos (geometric shapes) fall from the top. The player's objective is to rotate and move these pieces to form complete horizontal lines, which are then cleared. The game ends when pieces stack up to the top of the grid. Despite its simple rules, maximizing cleared lines given a sequence of pieces is an NP-complete problem. Tetris is typically modeled as a Markov Decision Process (MDP), where the state includes the grid configuration and the current falling piece, and actions are the possible placements of that piece. Most research implementations use a simple scoring system: one point per cleared line.

A major challenge in Tetris research is the variability in game implementations and scoring, making direct comparison of results difficult. Additionally, high score variance necessitates many game repetitions for accurate performance assessment, and games can be very lengthy. The common approach involves learning a linear evaluation function $V(s) = \mathbf{w}^T \phi(s)$ , where $\phi(s)$ is a vector of state features and $\mathbf{w}$ is a vector of weights. The agent selects the action (piece placement) that leads to the state with the highest evaluated value.

Algorithmic Approaches and Key Features

The paper chronicles various algorithmic attempts:

Early Attempts (Dynamic Programming and Reinforcement Learning):
- Tsitsiklis and Van Roy (1996) used approximate dynamic programming with features like the number of holes and the height of the highest column, achieving around 30 cleared lines on a $16 \times 10$ grid.
- Bertsekas and Tsitsiklis (1996) employed $\lambda$ -policy iteration with features including individual column heights and differences between adjacent column heights, clearing about 2,800 lines (on a slightly modified $19 \times 10$ grid).
- Lagoudakis et al. (2002) used least-squares policy iteration with additional features like mean column height, achieving 1,000-3,000 lines.
- Kakade et al. (2002) applied a policy-gradient algorithm with Bertsekas's features, clearing 6,800 lines.
- Farias and Van Roy (2006) used a linear programming approach, clearing around 4,500 lines with Bertsekas's features.
Hand-Crafted Agent:
- Pierre Dellacherie (reported by Fahey, 2003) developed a highly successful agent by manually tuning weights for six simple features:
  - Number of holes
  - Landing height of the piece
  - Number of row transitions (full to empty or vice-versa along rows)
  - Number of column transitions (similar, along columns)
  - Cumulative wells (sum of depths of wells, where depth $d$ contributes $\sum_{i=1}^{d} i$ )
  - Eroded cells (number of cleared lines multiplied by holes filled by the current piece)
- This agent cleared an average of 660,000 lines, outperforming other methods for years. The evaluation function was: Score = -4 * holes - cumulative_wells - row_transitions - column_transitions - landing_height + eroded_cells
Genetic Algorithms (GAs):
- B\"{o}hm et al. (2005) used evolutionary algorithms, achieving extremely high scores (e.g., 480 million lines with a linear policy) but notably their agent knew both the current and the next piece, making results incomparable. They introduced features like connected holes and weighted occupied cells.
- Szita and L\H{o}rincz (2006) used the cross-entropy method with Dellacherie's features, clearing around 350,000 lines. This method iteratively samples parameter vectors, updates a distribution based on the best performers, and adds decreasing noise for exploration.
- Thiery and Scherrer (2009) developed the BCTS (Building Controllers for Tetris) agent using the cross-entropy algorithm. They augmented Dellacherie's features with "hole depth" (sum of full squares above each hole) and "rows with holes," achieving 35 million lines. Adding "pattern diversity" (number of distinct height differences between adjacent columns, for differences < 3) further improved performance.
- Boumaza (2009) used Covariance Matrix Adaptation Evolution Strategy (CMA-ES) and found weights very similar to Dellacherie's, also achieving 35 million lines.
- The paper notes the resurgence of evolutionary strategies due to their parallelizability and success.
Approximate Modified Policy Iteration:
- Gabillon et al. (2013) achieved 51 million lines using Classification-Based Modified Policy Iteration (CBMPI). This RL algorithm uses rollouts to estimate state-action values and then employs CMA-ES to perform a cost-sensitive classification task to find good actions.
- CBMPI used states sampled from trajectories of the BCTS agent, subsampled to ensure a more uniform grid height distribution. This reliance on an existing good policy and the complexity of subsampling are noted as limitations. The feature set used for the policy was "DT" (Dellacherie + Thiery features, including pattern diversity) and for the value function, DT + Radial Basis Functions (RBFs) of the mean column height.

Structure of the Decision Environment

The paper highlights work by Şimşek et al. (2016) on exploiting structural regularities in Tetris to prune the action space, even with unknown feature weights. Three types of regularities were identified:

Simple Dominance: If placement A is better than or equal to placement B on all features and strictly better on at least one, B can be eliminated.
Cumulative Dominance: A more complex form of dominance.
Noncompensation: When the importance of features is known (e.g., fewer holes is always good).

Using simple dominance alone reduced the median number of placements from 17 to 3. Adding cumulative dominance reduced it to 1. This suggests that focusing on a few indicative features can significantly simplify decision-making.

Open Challenges

Despite progress, several challenges remain:

Feature Discovery: Current high-performing agents rely on hand-crafted features. Learning effective features directly from raw pixel inputs (e.g., using deep learning) is an unsolved problem in Tetris. Attempts so far have only cleared a few hundred lines.
Complex Scoring Functions: Most research uses a simple "lines cleared" score. Real-world Tetris often rewards clearing multiple lines simultaneously (e.g., a "Tetris" for 4 lines) or specific maneuvers like T-spins. Linear evaluation functions may not be optimal for these. This points to a need for learning action hierarchies or options (e.g., subgoals like setting up for an I-piece).
Understanding Human Play: Humans learn to play Tetris well with practice. Research into how humans perceive the game and decide on piece placements could inform AI development.
Sample-Efficient Learning: Current best methods are sample-intensive, often requiring hundreds of thousands of games or trajectories from already proficient agents. Developing algorithms that learn effectively from limited experience is crucial.

Beyond Tetris

Lessons from Tetris, such as the power of good feature engineering and exploiting environmental regularities, are applicable to more complex domains like real-time strategy (RTS) games (e.g., StarCraft) or open-world games. These games also feature unique situations and benefit from hierarchical task decomposition (strategy vs. tactics), where sub-problems might be solvable with simpler, feature-based approaches similar to those in Tetris. The observation of dominance structures in other games like backgammon suggests these principles are broadly relevant.

The paper concludes by emphasizing that video games like Tetris offer controlled environments to study decision-making under uncertainty, limited resources, and complexity—challenges faced by both humans and AI systems.

The Appendix of the paper provides a useful table summarizing the algorithms, their reported scores, grid sizes, and the specific feature sets they employed, along with detailed descriptions of these feature sets (Bertsekas, Lagoudakis, Dellacherie, B\"{o}hm, BCTS, DT, RBF). This is invaluable for anyone looking to replicate or build upon this research.

Markdown Report Issue