Games of Knightian Uncertainty as AGI testbeds (2406.18178v2)

Published 26 Jun 2024 in cs.AI

Abstract: Arguably, for the latter part of the late 20th and early 21st centuries, games have been seen as the drosophila of AI. Games are a set of exciting testbeds, whose solutions (in terms of identifying optimal players) would lead to machines that would possess some form of general intelligence, or at the very least help us gain insights toward building intelligent machines. Following impressive successes in traditional board games like Go, Chess, and Poker, but also video games like the Atari 2600 collection, it is clear that this is not the case. Games have been attacked successfully, but we are nowhere near AGI developments (or, as harsher critics might say, useful AI developments!). In this short vision paper, we argue that for game research to become again relevant to the AGI pathway, we need to be able to address \textit{Knightian uncertainty} in the context of games, i.e. agents need to be able to adapt to rapid changes in game rules on the fly with no warning, no previous data, and no model access.

Authors (3)

Summary

The paper introduces Knightian Uncertainty games that require AI agents to generalize in rapidly changing, non-stationary environments.
It critiques traditional game benchmarks for fostering memorization over genuine adaptability needed for AGI.
The proposed evaluation framework uses both near and far OOD testing to advance practical and theoretical AGI assessment.

Games of Knightian Uncertainty as AGI Testbeds

The paper "Games of Knightian Uncertainty as AGI testbeds," authored by Spyridon Samothrakis, Dennis Soemers, and Damian Machlanski, puts forth a compelling argument addressing the limitations of current AI research using games as benchmarks for AGI. While games have historically served as instrumental testbeds for AI development, the authors argue that the landscape of AI has evolved, necessitating new benchmark paradigms that can better inform the AGI pathway.

Limitations of Current Game-Based Benchmarks

The paper begins by acknowledging that games like Chess, Go, and Poker have significantly contributed to AI research. However, these successes in traditional board games and video games such as the Atari 2600 collection have not translated into the anticipated progress towards AGI. Current AI methodologies, especially those involving large-scale data and GPU-intensive models like LLMs, have shifted the focus from using games as AGI benchmarks. This shift stems from the realization that these traditional game setups are inadequate for assessing the complex requirements of AGI.

The authors identify two primary reasons for this inadequacy:

Representation Learning Learns Incomplete Representations: Current representational learning approaches often fail to capture meaningful global properties, resulting in memorization rather than generalization. The paper illustrates this with an example where modern regressors fail to generalize beyond training distributions, notably when tackling out-of-distribution (OOD) data. Such limitations underscore the necessity for AI to develop robust abstractions that can generalize well in novel contexts.
The Real World is Non-Stationary and Open: Traditional AI benchmarks assume closed, stationary environments, which starkly contrast the non-stationary and open-ended nature of real-world scenarios. The authors argue that AI systems need to learn and adapt to rapidly changing environments, something not adequately captured by current game-based benchmarks.

General Game Playing and its Shortcomings

The paper critiques General Game Playing (GGP) and related competitions for their limited scope in fostering true AGI development. These competitions demand agents to adapt across various similar games. However, they often provide agents with an accessible, sometimes perfect, model of the game environment. This paradigm does not align well with the real-world complexities an AGI is expected to navigate. Therefore, while these competitions have their utility, they do not sufficiently push the boundaries of generalization and adaptability necessary for AGI.

Knightian Uncertainty in Games

To address these shortcomings, the paper introduces the concept of "Games of Knightian Uncertainty." Such games encompass scenarios with rapidly changing rules, where agents face abrupt, unforeseen changes in game dynamics. This concept, rooted in economic theories of uncertainty (Knightian uncertainty), provides a robust framework for testing an agent's ability to adapt without prior data or model access.

Defining Knightian Games

The authors propose games where transition functions, rewards, actions, and observations are non-stationary and can change at any time. These types of games confront agents with "unknown unknowns," requiring them to develop and apply robust abstractions and reasoning capabilities rather than relying on memorization.

Proposed Benchmarks and Evaluation

For practical implementation, the paper suggests a two-step benchmarking setup:

Near OOD Testing: Agents are first evaluated on variations of the training game environment, demanding generalization to closely related but altered scenarios.
Far OOD Testing: Agents are later evaluated on entirely different games, challenging them to apply learned abstractions and adapt to new and diverse environments without pre-existing models.

Example setups for various games such as Chess, Poker, Mario, and GVG-AI elucidate the proposed benchmarking process. This methodology aims to test and enhance the generalization capacity of AI agents rigorously.

Conclusion

In conclusion, the paper emphasizes the need for the games research community to adopt more challenging benchmarks to remain relevant in the pursuit of AGI. By introducing the concept of Knightian games, the authors propose a novel and rigorous framework for evaluating AI's ability to generalize and adapt to unforeseen challenges. This paradigm shift in game-based benchmarks could play a crucial role in advancing towards AGI, by providing more meaningful and relevant evaluative metrics.

The implications of this research are both practical and theoretical. Practically, it could lead to the development of more versatile and adaptive AI systems. Theoretically, it provokes a reexamination of the core principles underlying AI learning and generalization. Future developments in AI may witness the incorporation of these benchmarks, potentially heralding a new era of AI research that better addresses the complexities and uncertainties of real-world environments.

PDF Markdown

Related Papers

The Hanabi Challenge: A New Frontier for AI Research (2019)
Measuring Intelligence through Games (2011)
The Many AI Challenges of Hearthstone (2019)
Leveling the Playing Field -- Fairness in AI Versus Human Game Benchmarks (2019)
AI Researchers, Video Games Are Your Friends! (2016)

Tweets

Reddit

Games of Knightian Uncertainty as AGI testbeds (17 points, 1 comment)