AXIOM: Learning to Play Games in Minutes with Expanding Object-Centric Models (2505.24784v1)

Published 30 May 2025 in cs.AI, cs.LG, and stat.ML

Abstract: Current deep reinforcement learning (DRL) approaches achieve state-of-the-art performance in various domains, but struggle with data efficiency compared to human learning, which leverages core priors about objects and their interactions. Active inference offers a principled framework for integrating sensory information with prior knowledge to learn a world model and quantify the uncertainty of its own beliefs and predictions. However, active inference models are usually crafted for a single task with bespoke knowledge, so they lack the domain flexibility typical of DRL approaches. To bridge this gap, we propose a novel architecture that integrates a minimal yet expressive set of core priors about object-centric dynamics and interactions to accelerate learning in low-data regimes. The resulting approach, which we call AXIOM, combines the usual data efficiency and interpretability of Bayesian approaches with the across-task generalization usually associated with DRL. AXIOM represents scenes as compositions of objects, whose dynamics are modeled as piecewise linear trajectories that capture sparse object-object interactions. The structure of the generative model is expanded online by growing and learning mixture models from single events and periodically refined through Bayesian model reduction to induce generalization. AXIOM masters various games within only 10,000 interaction steps, with both a small number of parameters compared to DRL, and without the computational expense of gradient-based optimization.

Summary

An Examination of AXIOM: A Data-Efficient Model for Game Playing

"AXIOM: Learning to Play Games in Minutes with Expanding Object-Centric Models" presents a novel approach to deep reinforcement learning (DRL) that emphasizes data efficiency and generalizability across multiple tasks. This paper introduces AXIOM, a model architecture that skillfully fuses active inference principles with object-centric world modeling to dramatically shorten the learning time required to master various games.

The Proposal and Its Context

The authors argue that while DRL has achieved remarkable success in diverse domains, including robotics and game playing, it falls short in terms of data efficiency when compared to human learning processes. Humans leverage core priors about objects and their interactions, enabling them to generalize across tasks more effectively. AXIOM aims to bridge this gap by incorporating minimal yet expressive core priors about object-centric dynamics and interactions, thus enhancing learning in low-data regimes.

Core Model and Methodology

AXIOM's framework is underpinned by active inference, a theoretical model advocating the integration of sensory inputs and prior knowledge to construct a world model while quantifying the uncertainty of predictions. Unlike traditionally task-specific active inference models, AXIOM is crafted to harness domain flexibility akin to DRL methods. Its architecture combines the following elements:

Slot Mixture Model (sMM): This component parses visual input into object-centric representations, dynamically expanding to accommodate new objects in the environment.
Transition Mixture Model (tMM): This module learns motion prototypes, identifying patterns such as falling or bouncing.
Recurrent Mixture Model (rMM): This layer deciphers causally relevant interactions among objects, utilizing data including object states, actions, and rewards.

Coalescing these modules, AXIOM encapsulates scene dynamics with remarkable efficiency, leveraging Bayesian model reduction to refine the structure and promote generalization.

Empirical Validation

To evaluate AXIOM, the authors introduce the Gameworld 10k benchmark, a suite of environments curated to test agents' efficiency in comprehending games within 10,000 interactions. AXIOM successfully outperforms established DRL approaches, achieving proficiency in these environments with fewer interaction steps and without relying on gradient-based optimization. The computational economy is notable, with AXIOM requiring fewer parameters than conventional DRL systems, providing a compelling contrast to the resource-intensive nature of standard methods.

Implications and Future Directions

AXIOM's contribution to AI research is multifaceted. Practically, its reduced reliance on large datasets and computational resources makes it suitable for applications in real-world scenarios where data collection is costly or impractical. Theoretically, it challenges the extant paradigms in DRL by illustrating the benefits of integrating structured, interpretable model architectures.

The potential signaling from this research is profound, with possibilities extending beyond mere game-playing to fields necessitating rapid decision-making and adaptability, like autonomous driving and real-time resource management.

In the continuum of advancements in reinforcement learning, AXIOM establishes a prominent benchmark for developing data-efficient, rapidly adaptable AI, catalyzing further exploration into combining Bayesian inference with deep learning architectures. Future work could explore the automatic derivation of core priors or extending this model to more visually complex environments like those found in real-world simulations. Such directions promise a fertile ground for evolving AI's capabilities closer to human-like learning, both in efficiency and flexibility.