Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play
The paper discusses a novel methodology for reinforcement learning, intrinsically motivated through the employment of asymmetric self-play. The technique aims to ameliorate the well-documented sample inefficiency in model-free approaches to reinforcement learning by allowing an agent to explore and learn its environment in the absence of extrinsic rewards.
Overview
In this method, an agent's training leverages unsupervised learning represented through dual personas: Alice and Bob. Alice generates tasks in a self-play scenario, which Bob then needs to complete, facilitating the automatic creation of a curricular progression of increasingly challenging tasks. Crucially, these tasks are tailored based on their complexity, offering a self-regulated feedback loop between Alice's task proposal and Bob's attempt at task completion. This process relies solely on intrinsic rewards and is facilitated in two specific types of environments: reversible environments and those capable of resetting to their initial state. This compartmentalization assists in navigating the complexities involved in communicating the task and determining its difficulty.
Approach and Implementation
The asymmetric self-play mechanisms artfully articulate a balance between task proposal and completion. Alice's rewards are structured such that her exploration seeks to not only challenge Bob beyond his comfort zone but also remain within the field of feasibility to ensure successful completion. This curates a step-wise escalation in exploration and learning within the environment. Herein, Bob's experience in understanding state transitions aids expedited learning of target tasks without requiring direct supervision.
Policy functions for both Alice and Bob are parameterized through neural network architectures that simulate the environment and task dynamics. The policies derived are notably universal in nature, especially in deterministic and Markovian contexts with finite states, enabling Bob to determine the least-step pathways between any two states.
Experimental Results and Evaluation
Extensive experiments across diverse environments demonstrate the method's robustness and flexibility. From simpler environments like the maze-based tasks to more complex frameworks such as the continuous control tasks in RLLab, and even the high-dimensional strategies demanded in StarCraft scenarios, asymmetric self-play has shown significant efficacy. Notably, it performs well against state-of-the-art exploration strategies, such as VIME and SimHash, especially in how quickly it learns new tasks.
The results highlight that when the self-play tasks are well-aligned with the target tasks, there is considerable speed-up and efficiency in achieving optimal performance in target tasks. This highlights the importance of self-play episode design in effectively leveraging intrinsic motivation towards improving policy learning.
Implications and Future Work
The implications of this research are multifaceted. Practically, the ability to uncover more sample-efficient methods of reinforcement learning has potent applications in environments where simulating interactions is expensive or limited. Theoretically, it prompts further exploration into self-generated curricula and the potential for intrinsic motivation frameworks to operate autonomously in more varied and less restrictive contexts.
The paper suggests several avenues for future work, such as exploring the potential for multiple coordinated Alices to introduce more diverse tasks and refining reward structures to ensure balanced task difficulty from Alice's proposals. Another future direction could involve extending this method to less structured environments where resetting or reversibility is limited, potentially by evolving the complexity of communicated tasks beyond direct actions.
In summary, the paper presents a thoroughly researched and well-articulated approach to reinforcement learning harnessing intrinsic motivation. It contributes significantly to automatic curriculum learning, providing a compelling case for continued innovation in this intriguing and impactful area of artificial intelligence.