Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Budgeted Policy Learning for Task-Oriented Dialogue Systems (1906.00499v1)

Published 2 Jun 2019 in cs.CL, cs.AI, cs.LG, and cs.NE

Abstract: This paper presents a new approach that extends Deep Dyna-Q (DDQ) by incorporating a Budget-Conscious Scheduling (BCS) to best utilize a fixed, small amount of user interactions (budget) for learning task-oriented dialogue agents. BCS consists of (1) a Poisson-based global scheduler to allocate budget over different stages of training; (2) a controller to decide at each training step whether the agent is trained using real or simulated experiences; (3) a user goal sampling module to generate the experiences that are most effective for policy learning. Experiments on a movie-ticket booking task with simulated and real users show that our approach leads to significant improvements in success rate over the state-of-the-art baselines given the fixed budget.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Zhirui Zhang (46 papers)
  2. Xiujun Li (37 papers)
  3. Jianfeng Gao (344 papers)
  4. Enhong Chen (242 papers)
Citations (34)