Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Adapting Text-based Dialogue State Tracker for Spoken Dialogues (2308.15053v3)

Published 29 Aug 2023 in cs.CL and cs.AI

Abstract: Although there have been remarkable advances in dialogue systems through the dialogue systems technology competition (DSTC), it remains one of the key challenges to building a robust task-oriented dialogue system with a speech interface. Most of the progress has been made for text-based dialogue systems since there are abundant datasets with written corpora while those with spoken dialogues are very scarce. However, as can be seen from voice assistant systems such as Siri and Alexa, it is of practical importance to transfer the success to spoken dialogues. In this paper, we describe our engineering effort in building a highly successful model that participated in the speech-aware dialogue systems technology challenge track in DSTC11. Our model consists of three major modules: (1) automatic speech recognition error correction to bridge the gap between the spoken and the text utterances, (2) text-based dialogue system (D3ST) for estimating the slots and values using slot descriptions, and (3) post-processing for recovering the error of the estimated slot value. Our experiments show that it is important to use an explicit automatic speech recognition error correction module, post-processing, and data augmentation to adapt a text-based dialogue state tracker for spoken dialogue corpora.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. Christina Bennett and Alexander Rudnicky. 2002. The carnegie mellon communicator corpus.
  2. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  3. Error correction in asr using sequence-to-sequence models. arXiv preprint arXiv:2202.01157.
  4. The atis spoken language systems pilot corpus. In Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24-27, 1990.
  5. The second dialog state tracking challenge. In Proceedings of the 15th annual meeting of the special interest group on discourse and dialogue (SIGDIAL), pages 263–272.
  6. Hypotheses ranking and state tracking for a multi-domain dialog system using multiple asr alternates. In Sixteenth Annual Conference of the International Speech Communication Association.
  7. “how robust ru?”: Evaluating task-oriented dialogue systems on spoken conversations. In 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pages 1147–1154. IEEE.
  8. Efficient dialogue state tracking by selectively overwriting memory. arXiv preprint arXiv:1911.03906.
  9. Vladimir I Levenshtein et al. 1966. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, volume 10, pages 707–710. Soviet Union.
  10. Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101.
  11. Language models as few-shot learner for task-oriented dialogue systems. arXiv preprint arXiv:2008.06239.
  12. Mem2seq: Effectively incorporating knowledge bases into end-to-end task-oriented dialog systems. arXiv preprint arXiv:1804.08217.
  13. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140):1–67.
  14. John W Ratcliff and David E Metzener. 1988. Pattern-matching-the gestalt approach. Dr Dobbs Journal, 13(7):46.
  15. Scalable and accurate dialogue state tracking via hierarchical sequence generation. arXiv preprint arXiv:1909.00754.
  16. Robust zero-shot cross-domain slot filling with example values. arXiv preprint arXiv:1906.06870.
  17. Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239.
  18. Tod-da: Towards boosting the robustness of task-oriented dialogue modeling on spoken conversations. arXiv preprint arXiv:2112.12441.
  19. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pages 38–45.
  20. Ubar: Towards fully end-to-end task-oriented dialog system with gpt-2. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 14230–14238.
  21. Description-driven task-oriented dialog modeling. arXiv preprint arXiv:2201.08904.
  22. Improving asr error correction using n-best hypotheses. In 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pages 83–89. IEEE.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Jaeseok Yoon (2 papers)
  2. Seunghyun Hwang (3 papers)
  3. Ran Han (29 papers)
  4. Jeonguk Bang (1 paper)
  5. Kee-Eung Kim (24 papers)