Curious Replay for Model-based Adaptation (2306.15934v1)

Published 28 Jun 2023 in cs.LG, cs.AI, and stat.ML

Abstract: Agents must be able to adapt quickly as an environment changes. We find that existing model-based reinforcement learning agents are unable to do this well, in part because of how they use past experiences to train their world model. Here, we present Curious Replay -- a form of prioritized experience replay tailored to model-based agents through use of a curiosity-based priority signal. Agents using Curious Replay exhibit improved performance in an exploration paradigm inspired by animal behavior and on the Crafter benchmark. DreamerV3 with Curious Replay surpasses state-of-the-art performance on Crafter, achieving a mean score of 19.4 that substantially improves on the previous high score of 14.5 by DreamerV3 with uniform replay, while also maintaining similar performance on the Deepmind Control Suite. Code for Curious Replay is available at https://github.com/AutonomousAgentsLab/curiousreplay

Citations (10)

View on Semantic Scholar

Summary

The paper introduces a novel curiosity-driven replay mechanism that prioritizes rarely seen experiences to enhance adaptation in model-based RL.
It integrates count-based and adversarial replay techniques to efficiently update world models in dynamic, non-stationary environments.
Experimental results on benchmarks like Crafter demonstrate state-of-the-art performance compared to traditional Dreamer agents.

Curious Replay for Model-Based Adaptation

Curious Replay introduces a novel framework to enhance the adaptability of model-based reinforcement learning (RL) agents in dynamic environments. By utilizing a curiosity-guided prioritized experience replay mechanism, Curious Replay aims to improve the training efficiency of world models, enabling agents to perform better in tasks requiring rapid adaptation to changes. The paper presents a methodology inspired by animal behavior and validates it across various benchmarks, including the Crafter environment, achieving state-of-the-art performance.

Introduction

In dynamic environments, AI agents must swiftly adapt to new stimuli to maintain robust performance. Conventional model-based RL agents, such as Dreamer, often struggle with adaptation due to reliance on uniform sampling from the experience replay buffer, which can result in inadequate model updates when environments change. This paper introduces Curious Replay, a method that prioritizes training on experiences that are least familiar to the model, thereby enhancing the agent's adaptability.

Figure 1: Curious Replay closes the loop between experience replay and world model performance by using curiosity-guided prioritization to promote training on experiences the model is least familiar with.

Methodology

Curious Replay combines principles from prioritized experience replay and curiosity-driven exploration. In contrast to traditional prioritized replay, which often utilizes TD-error-based prioritization, Curious Replay employs a curiosity-based signal to determine replay priorities:

Count-Based Replay: Prioritizes experiences based on their replay count, biasing towards less frequently replayed experiences, hence encouraging exploration of newer data.
Adversarial Replay: Utilizes model prediction errors as a prioritization signal, selectively focusing on experiences that the model finds most challenging.

These elements are integrated into the replay buffer to dynamically adjust training priorities in response to observed environmental changes. The prioritization formula is defined as:

$p_i = c\beta^{v_i} + (|\mathcal{L}_i| + \epsilon)^\alpha$

where $p_i$ is the priority of experience $i$ , $v_i$ is its replay count, $\mathcal{L}_i$ is the model loss, and $c, \beta, \alpha, \epsilon$ are hyperparameters controlling prioritization weighting.

Figure 2: Inspired by animal behavior, we investigate adaptation using an object interaction assay...

Experimental Evaluation

Curious Replay was rigorously tested on several benchmarks to assess its efficacy:

Object Interaction Assay

In this test inspired by animal behavior, a model-based agent interacts with a novel object in an environment. Baseline Plan2Explore agents show slow adaptation, while Curious Replay demonstrates a significantly faster interaction time.

Constrained Control Suite

The Constrained Control Suite provides environments with altered dynamics requiring adaptation. Curious Replay augmented agents showed superior performance in adapting to these constraints compared to baseline Dreamer agents.

Figure 3: DreamerV2 w/ Curious Replay outperforms DreamerV2 and DreamerV2 w/ TD in the Constrained Control Suite.

Crafter Benchmark

Crafter, an environment requiring complex, hierarchical achievement unlocks, was used to validate Curious Replay’s sustained adaptability. Curious Replay achieved new state-of-the-art scores, surpassing previous iterations of Dreamer models.

Figure 4: Agent ability spectrum for Crafter, ordered left to right by number of prerequisites for an achievement.

Discussion

Curious Replay's innovative prioritization uniquely positions it for environments with non-stationary dynamics, promoting more effective model updates and policy learning. This adaptability aligns closely with strategic exploration behaviors observed in animals, offering potential insights into biological learning systems.

Implications and Future Work

The Curious Replay framework provides a robust methodology for enhancing adaptive capabilities in RL agents, potentially informing developments in robotics and autonomous systems where environmental changes are frequent. Future work could extend this framework to other model architectures and explore integration with robust learning techniques against distracting stimuli.

Conclusion

Curious Replay elevates model-based RL by incorporating curiosity-driven prioritization into experience replay. Its success in varied, challenging environments points to broader applications and further explorations in adaptation-centric AI research.