Transfer Learning in Deep Reinforcement Learning: A Survey (2009.07888v7)

Published 16 Sep 2020 in cs.LG, cs.AI, and stat.ML

Abstract: Reinforcement learning is a learning paradigm for solving sequential decision-making problems. Recent years have witnessed remarkable progress in reinforcement learning upon the fast development of deep neural networks. Along with the promising prospects of reinforcement learning in numerous domains such as robotics and game-playing, transfer learning has arisen to tackle various challenges faced by reinforcement learning, by transferring knowledge from external expertise to facilitate the efficiency and effectiveness of the learning process. In this survey, we systematically investigate the recent progress of transfer learning approaches in the context of deep reinforcement learning. Specifically, we provide a framework for categorizing the state-of-the-art transfer learning approaches, under which we analyze their goals, methodologies, compatible reinforcement learning backbones, and practical applications. We also draw connections between transfer learning and other relevant topics from the reinforcement learning perspective and explore their potential challenges that await future research progress.

References (181)

Citations (467)

View on Semantic Scholar

Summary

The paper comprehensively categorizes TL approaches in DRL, covering methods from reward shaping to policy transfer to improve learning efficiency.
It details methodologies such as learning from demonstrations and inter-task mapping, offering practical insights for applications in robotics, gaming, and health informatics.
The survey highlights future research directions focused on scalable TL methods to address challenges like sample efficiency and exploration-exploitation tradeoffs.

Transfer Learning in Deep Reinforcement Learning: A Survey

The paper "Transfer Learning in Deep Reinforcement Learning: A Survey" provides a comprehensive overview of the integration of Transfer Learning (TL) within the framework of Deep Reinforcement Learning (DRL). With the rapid advances in deep neural networks, DRL has delivered significant results in domains like robotics, gaming, and health informatics, although challenges remain, particularly concerning sample efficiency and exploration-exploitation balance. TL emerges as a promising solution, facilitating knowledge transfer from previously acquired expertise to enhance learning efficiency and effectiveness in these complex decision-making environments.

The survey meticulously categorizes state-of-the-art TL approaches in DRL by examining their goals, methodologies, reinforcement learning compatibility, and real-world applicability. It aims to fill the gap left by prior surveys, which have not covered the TL advancements in DRL achieved over the past decade, especially those enabled by deep learning techniques.

Reward Shaping

Reward Shaping (RS) modifies the reward signal provided by the environment by integrating auxiliary rewards derived from external knowledge, effectively guiding the agent's exploration and learning process. RS includes approaches like Potential Based RS and Dynamic Potential Based Approaches, which systematically evolve from static potential functions to dynamic frameworks that can incorporate arbitrary knowledge. This adaptation is widespread in domains demanding high precision, such as robot training and dialogue systems.

Learning from Demonstrations

Learning from Demonstrations (LfD) provides a mechanism where agents use previously recorded state-action trajectories to refine their policies. Techniques in this domain either incorporate demonstrations during a pre-training phase or leverage them directly within the online learning process. The surveyed methods reveal a versatility in adopting various RL backbones, ranging from policy iteration to Q-learning and policy gradients. They also emphasize the challenges of learning from suboptimal demonstrations and the risk of covariate drift, prompting innovations like adversarial imitation learning.

Policy Transfer

The process of policy transfer involves distilling or reusing knowledge from pre-trained policies, necessitating either a many-to-one relationship with multiple teacher policies or focusing on a singular expert policy. Techniques like policy distillation utilize supervised learning paradigms, steering student policies to mimic pre-trained experts. The application of policy reuse includes methods like Generalized Policy Improvement, which allows leveraging action-value functions from multiple learned policies to guarantee performance improvements in new tasks.

Inter-Task Mapping and Representation Transfer

Inter-task mapping builds a bridge between source and target tasks via mappings over state, action or transition spaces, enabling efficient translation of learned knowledge. Meanwhile, representation transfer approaches emphasize learning invariant features across tasks. Techniques like Successor Representations and Universal Value Function Approximators demonstrate that disentangling task-specific variations can facilitate effective DRL training across diverse environments.

Implications and Future Directions

By offering a structured categorization of TL approaches, this survey highlights the inherent opportunities in combining TL with DRL for broader applicability in AI. As the trajectory toward increasingly complex environments and significant real-world applications continues, enhancing sample efficiency, with a focus on robust, framework-agnostic transfer learning methods, remains vital.

Looking forward, the research community is prompted to explore more deeply the disentanglement of cross-domain representations and to refine methodologies that allow seamless, scalable transfer of high-dimensional knowledge. This exploration, particularly in light of recent successes in pre-trained models and large-scale model training, underscores a promising avenue for achieving a more generalized and adaptive AI.

In summary, this paper provides a thorough investigation into the latest advancements in TL for DRL, alongside practical guidance on future research trajectories necessary to tackle emerging challenges in the field of AI.

PDF Markdown