Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Adversarial Advantage Actor-Critic Model for Task-Completion Dialogue Policy Learning (1710.11277v2)

Published 31 Oct 2017 in cs.CL, cs.AI, and cs.LG

Abstract: This paper presents a new method --- adversarial advantage actor-critic (Adversarial A2C), which significantly improves the efficiency of dialogue policy learning in task-completion dialogue systems. Inspired by generative adversarial networks (GAN), we train a discriminator to differentiate responses/actions generated by dialogue agents from responses/actions by experts. Then, we incorporate the discriminator as another critic into the advantage actor-critic (A2C) framework, to encourage the dialogue agent to explore state-action within the regions where the agent takes actions similar to those of the experts. Experimental results in a movie-ticket booking domain show that the proposed Adversarial A2C can accelerate policy exploration efficiently.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Baolin Peng (72 papers)
  2. Xiujun Li (37 papers)
  3. Jianfeng Gao (344 papers)
  4. Jingjing Liu (139 papers)
  5. Yun-Nung Chen (104 papers)
  6. Kam-Fai Wong (92 papers)
Citations (61)