Overview
The research presented in this manuscript explores the scalability of transformer-based models to create generalist reinforcement learning (RL) agents capable of performing across multiple gaming environments. The investigation builds upon the success of such models in language and vision tasks, aiming to utilize large and diverse datasets alongside transformers to achieve generalized performance in RL. The authors introduce the Multi-Game Decision Transformer, a single set-parameter model that demonstrates the potential to train a high-performing generalist agent to act across diverse tasks based on offline data alone.
Methods and Contributions
The proposed Multi-Game Decision Transformer addresses the challenge of training on a set of 41 distinct Atari games with varying dynamics, visuals, and agent embodiments, leveraging previously collected trajectories. This approach seeks to identify whether learning from an extensive range of video game experiences allows models to capture something universally beneficial. Critically, the researchers deviate from standard decision transformers by incorporating a guided generation technique to generate expert-level actions from both expert and non-expert experience trajectories during inference.
The work contrasts multiple methods in the multi-game domain, reviewing online reinforcement learning, offline temporal difference methods, contrastive representations, and behavior cloning. Among these, decision-transformer-based models surface as the superior choice in terms of scalability and generalist agent performance when modeling the task as offline sequence modeling.
Findings
A key finding is that the Multi-Game Decision Transformer achieves an aggregate performance exceeding human-level gameplay across all evaluated games. It exhibits rapid fine-tuning to unfamiliar games with limited data and showcases scalability akin to advancements seen in language and vision—larger models consistently offer improved performance. Not all multi-environment training techniques yield positive outcomes, and the manuscript carefully delineates those that fell short, such as offline non-transformer models and online multi-game methods.
Future Research Implications
This paper sets a precedent for further exploration into generalist agents, offering a fostering ground for future research in this domain. The authors provide the models and code to the community, facilitating ongoing work. Beyond the results, there is an allusion to an important question: whether online learning algorithms can be modified to be as receptive to data as methods like Decision Transformers. This marks an avenue where the RL field could possibly further evolve.
Limitations and Societal Impacts
The researchers recognize that the work's generalizability might be limited by the specificity of the Atari environment and online/offline RL dataset scales. Moreover, they caution against extending their algorithms and methodologies to scenarios involving human interaction without thorough consideration of safety and ethical implications. While the current models' application is restricted to game-playing, the potential for decision-making based on reward feedback signifies an untapped area that needs careful alignment with human values and objectives.