Automated Curriculum Learning by Rewarding Temporally Rare Events

Published 19 Mar 2018 in cs.AI | (1803.07131v2)

Abstract: Reward shaping allows reinforcement learning (RL) agents to accelerate learning by receiving additional reward signals. However, these signals can be difficult to design manually, especially for complex RL tasks. We propose a simple and general approach that determines the reward of pre-defined events by their rarity alone. Here events become less rewarding as they are experienced more often, which encourages the agent to continually explore new types of events as it learns. The adaptiveness of this reward function results in a form of automated curriculum learning that does not have to be specified by the experimenter. We demonstrate that this \emph{Rarity of Events} (RoE) approach enables the agent to succeed in challenging VizDoom scenarios without access to the extrinsic reward from the environment. Furthermore, the results demonstrate that RoE learns a more versatile policy that adapts well to critical changes in the environment. Rewarding events based on their rarity could help in many unsolved RL environments that are characterized by sparse extrinsic rewards but a plethora of known event types.

Abstract PDF Upgrade to Chat

Authors (2)

Citations (20)

View on Semantic Scholar

Summary

The paper presents Rarity of Events (RoE), a novel automated curriculum learning method for reinforcement learning that rewards agents based on the temporal infrequency of predefined events.
RoE encourages continuous exploration and self-directed progression through complex behaviors by making rewards proportional to event rarity, effectively replacing manual reward shaping and environment difficulty adjustments.
Evaluated in VizDoom, RoE-trained agents outperformed those using traditional extrinsic rewards, demonstrating superior adaptability to environmental changes and acquiring more versatile behaviors.

Automated Curriculum Learning by Rewarding Temporally Rare Events: An Expert Overview

The paper "Automated Curriculum Learning by Rewarding Temporally Rare Events" by Niels Justesen and Sebastian Risi from the IT University of Copenhagen presents a novel approach to reinforcement learning (RL) that addresses the challenges posed by environments with sparse and delayed rewards. This approach, termed Rarity of Events (RoE), reshapes the reward landscape by rewarding RL agents based on the infrequency of certain predefined events. This mechanism inherently encourages continuous exploration and adaptation, effectively automating curriculum learning without manual intervention.

The authors identify a significant challenge in traditional deep reinforcement learning and deep neuroevolution methods: the difficulty in learning from environments with infrequent or delayed feedback. Standard reward shaping techniques require extensive domain knowledge and are labor-intensive, while existing curriculum learning methods demand adjustments in environment difficulty, complicating their application to complex tasks. The RoE approach sidesteps these limitations by autonomously modifying the reward structure as the agent learns, relying solely on a predefined set of positive events.

The RoE method evaluates the reward for each predefined event solely on the basis of its temporal rarity, encouraging agents to seek out less frequent occurrences and thereby expanding their exploration within the environment. This reward strategy becomes a surrogate for the traditional curriculum learning process, allowing the agents to self-direct their progression through increasingly complex behaviors. Notably, the RoE approach dispenses with the need for extrinsic rewards typically supplied by the environment, opting instead for intrinsic motivation that fosters curiosity-driven learning.

The framework was tested in the VizDoom platform, chosen for its diverse scenarios that exemplify environments with sparse and delayed rewards. The results demonstrated that agents trained under the RoE paradigm outperformed their counterparts trained using traditional extrinsic rewards in the majority of evaluated scenarios. This performance is attributed to the acquisition of more versatile behaviors that are better suited to adapt to environmental changes. Specifically, RoE-trained agents exhibited superior adaptability in scenarios with modifications to available resources, which were unseen during the training phase.

A key theoretical implication of this work is the potential for RoE to reduce the dependency on manual reward shaping, making it easier to apply RL techniques to a broader array of complex environments. By aligning agent exploration with event rarity, RoE offers a scalable and adaptable framework for solving RL problems that were previously infeasible due to reward sparsity. Moreover, the approach holds promise for video game AI, where diverse and human-like behaviors are desirable, not only for gameplay but also for automated testing and development purposes.

Future work could explore the generalizability of RoE to domains beyond gaming, such as robotics or other interactive AI systems. Moreover, refining the framework to handle even larger sets of event types and integrating it with other intrinsic motivation theories could enhance its applicability.

In conclusion, the Rarity of Events approach proposed in this paper marks a significant development in automated curriculum learning for reinforcement learning environments. By leveraging the intrinsic value of rare events, this method provides a viable solution for overcoming the challenges of sparse and delayed rewards, while advancing the potential for RL applications in complex, dynamic domains.

Markdown Report Issue