Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Causality and Batch Reinforcement Learning: Complementary Approaches To Planning In Unknown Domains (2006.02579v1)

Published 3 Jun 2020 in cs.LG and cs.AI

Abstract: Reinforcement learning algorithms have had tremendous successes in online learning settings. However, these successes have relied on low-stakes interactions between the algorithmic agent and its environment. In many settings where RL could be of use, such as health care and autonomous driving, the mistakes made by most online RL algorithms during early training come with unacceptable costs. These settings require developing reinforcement learning algorithms that can operate in the so-called batch setting, where the algorithms must learn from set of data that is fixed, finite, and generated from some (possibly unknown) policy. Evaluating policies different from the one that collected the data is called off-policy evaluation, and naturally poses counter-factual questions. In this project we show how off-policy evaluation and the estimation of treatment effects in causal inference are two approaches to the same problem, and compare recent progress in these two areas.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. James Bannon (1 paper)
  2. Brad Windsor (4 papers)
  3. Wenbo Song (1 paper)
  4. Tao Li (441 papers)
Citations (17)

Summary

We haven't generated a summary for this paper yet.