Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Decoupled Exploration and Exploitation Policies for Sample-Efficient Reinforcement Learning (2101.09458v2)

Published 23 Jan 2021 in cs.LG

Abstract: Despite the close connection between exploration and sample efficiency, most state of the art reinforcement learning algorithms include no considerations for exploration beyond maximizing the entropy of the policy. In this work we address this seeming missed opportunity. We observe that the most common formulation of directed exploration in deep RL, known as bonus-based exploration (BBE), suffers from bias and slow coverage in the few-sample regime. This causes BBE to be actively detrimental to policy learning in many control tasks. We show that by decoupling the task policy from the exploration policy, directed exploration can be highly effective for sample-efficient continuous control. Our method, Decoupled Exploration and Exploitation Policies (DEEP), can be combined with any off-policy RL algorithm without modification. When used in conjunction with soft actor-critic, DEEP incurs no performance penalty in densely-rewarding environments. On sparse environments, DEEP gives a several-fold improvement in data efficiency due to better exploration.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. William F. Whitney (15 papers)
  2. Michael Bloesch (24 papers)
  3. Jost Tobias Springenberg (48 papers)
  4. Abbas Abdolmaleki (38 papers)
  5. Kyunghyun Cho (292 papers)
  6. Martin Riedmiller (64 papers)
Citations (13)

Summary

We haven't generated a summary for this paper yet.