Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On Hard Exploration for Reinforcement Learning: a Case Study in Pommerman (1907.11788v1)

Published 26 Jul 2019 in cs.LG, cs.AI, and stat.ML

Abstract: How to best explore in domains with sparse, delayed, and deceptive rewards is an important open problem for reinforcement learning (RL). This paper considers one such domain, the recently-proposed multi-agent benchmark of Pommerman. This domain is very challenging for RL --- past work has shown that model-free RL algorithms fail to achieve significant learning without artificially reducing the environment's complexity. In this paper, we illuminate reasons behind this failure by providing a thorough analysis on the hardness of random exploration in Pommerman. While model-free random exploration is typically futile, we develop a model-based automatic reasoning module that can be used for safer exploration by pruning actions that will surely lead the agent to death. We empirically demonstrate that this module can significantly improve learning.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Chao Gao (122 papers)
  2. Bilal Kartal (12 papers)
  3. Pablo Hernandez-Leal (13 papers)
  4. Matthew E. Taylor (69 papers)
Citations (10)

Summary

We haven't generated a summary for this paper yet.