Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sample-efficient Deep Reinforcement Learning for Dialog Control (1612.06000v1)

Published 18 Dec 2016 in cs.AI, cs.LG, and stat.ML

Abstract: Representing a dialog policy as a recurrent neural network (RNN) is attractive because it handles partial observability, infers a latent representation of state, and can be optimized with supervised learning (SL) or reinforcement learning (RL). For RL, a policy gradient approach is natural, but is sample inefficient. In this paper, we present 3 methods for reducing the number of dialogs required to optimize an RNN-based dialog policy with RL. The key idea is to maintain a second RNN which predicts the value of the current policy, and to apply experience replay to both networks. On two tasks, these methods reduce the number of dialogs/episodes required by about a third, vs. standard policy gradient methods.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Kavosh Asadi (23 papers)
  2. Jason D. Williams (12 papers)
Citations (21)

Summary

We haven't generated a summary for this paper yet.