Dice Question Streamline Icon: https://streamlinehq.com

Solving the Jericho benchmark suite of text-based games

Determine whether the Jericho benchmark suite of text-based interactive fiction games can be fully solved by autonomous agents under the standard Jericho evaluation protocol, specifically overcoming partial observability that requires constructing world models from local textual descriptions and combinatorial state–action spaces that cause hard-exploration challenges.

Information Square Streamline Icon: https://streamlinehq.com

Background

The Jericho benchmark is a widely used suite of text-based interactive fiction games designed to evaluate agents’ abilities in language-grounded decision making. It presents two core challenges: partial observability, which requires agents to infer and maintain world state from local textual observations, and extremely large combinatorial action spaces derived from natural language commands.

Despite progress with RL, MCTS, and LLM-based agents, Jericho has proven difficult to solve comprehensively, and existing approaches have required extensive environment interactions or exhibit limited exploration capabilities. The paper explicitly acknowledges that Jericho remains unsolved, motivating methods that can learn effectively from sparse rewards and navigate deceptive local optima.

References

The Jericho benchmark (Hausknecht et al., 2019) remains an unsolved hard-exploration problem, where the text-based game environments provide two fundamental challenges: (1) partial observability, requiring agents to construct models of the world from local textual descriptions, and (2) combinatorial state-action spaces.

Dual-Scale World Models for LLM Agents Towards Hard-Exploration Problems (2509.24116 - Kim et al., 28 Sep 2025) in Section 2 (Background)