Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Near-optimal Policy Identification in Active Reinforcement Learning (2212.09510v1)

Published 19 Dec 2022 in stat.ML, cs.AI, and cs.LG

Abstract: Many real-world reinforcement learning tasks require control of complex dynamical systems that involve both costly data acquisition processes and large state spaces. In cases where the transition dynamics can be readily evaluated at specified states (e.g., via a simulator), agents can operate in what is often referred to as planning with a \emph{generative model}. We propose the AE-LSVI algorithm for best-policy identification, a novel variant of the kernelized least-squares value iteration (LSVI) algorithm that combines optimism with pessimism for active exploration (AE). AE-LSVI provably identifies a near-optimal policy \emph{uniformly} over an entire state space and achieves polynomial sample complexity guarantees that are independent of the number of states. When specialized to the recently introduced offline contextual Bayesian optimization setting, our algorithm achieves improved sample complexity bounds. Experimentally, we demonstrate that AE-LSVI outperforms other RL algorithms in a variety of environments when robustness to the initial state is required.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Xiang Li (1003 papers)
  2. Viraj Mehta (12 papers)
  3. Johannes Kirschner (17 papers)
  4. Ian Char (10 papers)
  5. Willie Neiswanger (68 papers)
  6. Jeff Schneider (99 papers)
  7. Andreas Krause (269 papers)
  8. Ilija Bogunovic (44 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.