Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

BeBold: Exploration Beyond the Boundary of Explored Regions (2012.08621v1)

Published 15 Dec 2020 in cs.LG, cs.AI, and stat.ML

Abstract: Efficient exploration under sparse rewards remains a key challenge in deep reinforcement learning. To guide exploration, previous work makes extensive use of intrinsic reward (IR). There are many heuristics for IR, including visitation counts, curiosity, and state-difference. In this paper, we analyze the pros and cons of each method and propose the regulated difference of inverse visitation counts as a simple but effective criterion for IR. The criterion helps the agent explore Beyond the Boundary of explored regions and mitigates common issues in count-based methods, such as short-sightedness and detachment. The resulting method, BeBold, solves the 12 most challenging procedurally-generated tasks in MiniGrid with just 120M environment steps, without any curriculum learning. In comparison, the previous SoTA only solves 50% of the tasks. BeBold also achieves SoTA on multiple tasks in NetHack, a popular rogue-like game that contains more challenging procedurally-generated environments.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Tianjun Zhang (38 papers)
  2. Huazhe Xu (93 papers)
  3. Xiaolong Wang (243 papers)
  4. Yi Wu (171 papers)
  5. Kurt Keutzer (200 papers)
  6. Joseph E. Gonzalez (167 papers)
  7. Yuandong Tian (128 papers)
Citations (38)

Summary

We haven't generated a summary for this paper yet.