PuzzleJAX: GPU-Accelerated Puzzle Engine

Updated 27 August 2025

PuzzleJAX is a GPU-accelerated puzzle game engine and DSL that benchmarks AI algorithms like tree search, reinforcement learning, and LLM reasoning.
It utilizes JAX to vectorize puzzle simulations with dynamic compilation and efficient convolution-based rule execution for grid puzzles.
Performance gains of 2×–16× over PuzzleScript and extensive coverage of human-designed puzzles enable advanced AI planning analysis.

PuzzleJAX is a GPU-accelerated puzzle game engine and description language designed for benchmarking and research in tree search, reinforcement learning (RL), and LLM reasoning. It enables dynamic compilation of any puzzle game expressible in its domain-specific language (DSL), structurally compatible with PuzzleScript, thus covering a wide range of human-designed grid-based puzzle tasks. PuzzleJAX serves as both an expressive simulation environment and a rigorous testbed, enabling analysis of AI performance across diverse planning, control, and reasoning challenges (Earle et al., 22 Aug 2025).

1. Engine Architecture and Domain-Specific Language

PuzzleJAX is a reimplementation of the PuzzleScript engine in JAX, an autodiff-compatible, vectorized array framework optimized for GPU execution. Its DSL divides each game file into distinct sections: Prelude, Objects, Legend, Sounds, Collision Layers, Rules, Win Conditions, and Levels.

Rule Encoding: Interaction rules are written as rewrite rules; for example, a Sokoban mechanic:
1
[ > Player | Crate ] -> [ > Player | > Crate ]
This specifies that a Player moving right, when adjacent to a Crate, triggers a simultaneous rightward movement of both entities.
Parsing and Representation: PuzzleScript files are tokenized by a context-free grammar (implemented via Lark), then compiled to structured Python objects. Game states are stored as multihot binary arrays, representing object presence, directional forces, and recent actions within channels.
Transition Mechanics: Rewrite rules are transformed into pattern detection convolution kernels for the left-hand side and transposed convolution for the right-hand projection. The execution logic is hierarchical, with rules grouped and rotated as needed, applied repeatedly via nested JAX while loops until the system reaches stasis.

This architecture enables efficient simulation and manipulation of structurally diverse puzzle environments at scale.

2. Benchmarking Modalities

PuzzleJAX provides a unified benchmark for three primary modalities:

Modality	Algorithmic Example	PuzzleJAX Evaluation Features
Tree Search	Breadth-First Search	Iterative state expansion, all-or-nothing solvability, per-level solved stats
Reinforcement Learning (RL)	PPO, ConvNets	Multihot grid states, reward curves, learning behavior visualization
LLM Reasoning	Structured State Prompt	ascii_state context, rules/actions as context, per-game model win rates

Tree Search: Classical breadth-first search is applied; simple puzzles are often solved within 1 million iterations, whereas combinatorially complex or deadlock-prone games remain unsolved.
Reinforcement Learning: PPO agents are evaluated on diverse puzzles; early reward increases are commonly observed, often plateauing due to reward sparsity or deadlocks.
LLM Reasoning: Models receive structured state and rules; evaluation demonstrates high variance, with select games solved on short trajectories and others yielding 0% win rates.

3. Performance and Scalability

PuzzleJAX leverages JAX's GPU acceleration for high-throughput simulation. In direct comparisons with the original JavaScript PuzzleScript engine, PuzzleJAX achieves speedups ranging from 2× to 16×, particularly evident in vectorized batch execution settings. Frame-per-second benchmarks and batch execution profiles substantiate these gains.

In tree search, a tabulated breakdown reveals that “Sokoban Basic” and “Sokoban Match 3” achieve near-complete solvability under 1 million iterations, while “Notsnake” and “Zen Puzzle Garden” exhibit nontrivial combinatorial barriers even under extended search. RL and LLM result heatmaps illustrate wide variability across games and agent architectures.

4. Coverage and Expressiveness

PuzzleJAX validates several hundred out of thousands of PuzzleScript games, spanning tasks authored by both professional designers and casual users. Its DSL expresses puzzle mechanics from classic box-pushing (Sokoban), simple navigation (“Blocks”), to sophisticated constructs like “Travelling Salesman” or multi-dictionary logic games. The DSL supports nuanced semantics (directional, collision, wildcards) and complex win conditions.

This breadth ensures exposure to semantically diverse, human-relevant planning and reasoning domains, making PuzzleJAX suitable for generalization, transfer learning, and automated game design studies.

5. Analysis of AI Challenges

PuzzleJAX exposes several limitations of contemporary search, learning, and reasoning algorithms:

Long-Horizon Planning/Sparse Rewards: Many puzzles provide only delayed rewards, hindering RL agents which optimize for early reward without full solution trajectory planning.
Mechanics Inversion/Semantic Complexity: The DSL supports unconventional mechanics, leading to non-intuitive dynamics that confound model-free agents and LLMs alike.
Deadlocks and Combinatorial Trap States: Certain puzzles force deadlock upon suboptimal action, requiring anticipatory strategies beyond greedy or myopic heuristics.

Empirical evidence suggests naive tree search often outperforms RL and LLM-based strategies in deep, late-reward puzzles, highlighting the critical need for integrated search, planning, and “insight”-based reasoning paradigms.

6. Future Directions and Applications

PuzzleJAX’s open-ended game compilation and extensive coverage suggest future research opportunities:

Hybrid AI Approaches: Combining tree search, RL, and explicit reasoning could yield superior performance on tasks with combinatorial traps or sparse signals.
Continual and Meta-Learning: Automated compilation of new puzzle variants positions PuzzleJAX as an environment for agents to learn adaptation and self-improvement.
Automated Game Design and Testing: Researchers may leverage PuzzleJAX for algorithmic design and playtesting, expanding its impact to both AI and human game development workflows.

A plausible implication is that AI research leveraging PuzzleJAX will advance towards agents capable of synthesizing search, learning, and symbolic reasoning, moving closer to robust, general puzzle-solving abilities.

7. Conclusion

PuzzleJAX constitutes a GPU-accelerated, PuzzleScript-compatible platform for simulation and benchmarking of puzzle games in a modern deep learning ecosystem (Earle et al., 22 Aug 2025). With rigorous performance analysis, massive coverage of human-designed tasks, and modalities for evaluating tree search, RL, and LLMs, PuzzleJAX facilitates comprehensive investigation of reasoning and learning in grid-based puzzle domains. Analysis demonstrates that while search methods solve many puzzles, learning-based agents encounter significant obstacles—chiefly in long-horizon planning, sparse-reward exploration, and complex rule inversion. These challenges underscore the need for enhanced hybrid and meta-learning approaches, and establish PuzzleJAX as both a benchmark and stimulus for future AI research in reasoning and learning.

PDF Markdown Chat (Pro)

References (1)

PuzzleJAX: A Benchmark for Reasoning and Learning (2025)

Follow Topic

Get notified by email when new papers are published related to PuzzleJAX.