Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Building Markovian Generative Architectures over Pretrained LM Backbones for Efficient Task-Oriented Dialog Systems (2204.06452v2)

Published 13 Apr 2022 in cs.CL and cs.HC

Abstract: Recently, Transformer based pretrained LLMs (PLMs), such as GPT2 and T5, have been leveraged to build generative task-oriented dialog (TOD) systems. A drawback of existing PLM-based models is their non-Markov architectures across turns, i.e., the whole history is used as the conditioning input at each turn. First, this brings inefficiencies in memory and computation. Furthermore, using the whole history increases model complexity and may hurt the training efficiency, especially when facing small amounts of labeled training data (the low-resource setting). In this paper, motivated by the observation that dialog states could be viewed as Markov states, we propose to build Markovian Generative Architectures (MGA) over PLM backbones for efficient TOD systems. Experiments on MultiWOZ2.1 show that in the rich-resource setting, the proposed Markov models reduce memory and time costs without performance degradation; in the low-resource setting, the training efficiency of the Markov models is more significant.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Hong Liu (394 papers)
  2. Yucheng Cai (11 papers)
  3. Zhijian Ou (58 papers)
  4. Yi Huang (161 papers)
  5. Junlan Feng (63 papers)
Citations (4)