Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 91 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 31 tok/s
GPT-5 High 36 tok/s Pro
GPT-4o 95 tok/s
GPT OSS 120B 478 tok/s Pro
Kimi K2 223 tok/s Pro
2000 character limit reached

DuetSim: Building User Simulator with Dual Large Language Models for Task-Oriented Dialogues (2405.13028v1)

Published 16 May 2024 in cs.CL and cs.AI

Abstract: User Simulators play a pivotal role in training and evaluating task-oriented dialogue systems. Traditional user simulators typically rely on human-engineered agendas, resulting in generated responses that often lack diversity and spontaneity. Although LLMs exhibit a remarkable capacity for generating coherent and contextually appropriate utterances, they may fall short when tasked with generating responses that effectively guide users towards their goals, particularly in dialogues with intricate constraints and requirements. This paper introduces DuetSim, a novel framework designed to address the intricate demands of task-oriented dialogues by leveraging LLMs. DuetSim stands apart from conventional approaches by employing two LLMs in tandem: one dedicated to response generation and the other focused on verification. This dual LLM approach empowers DuetSim to produce responses that not only exhibit diversity but also demonstrate accuracy and are preferred by human users. We validate the efficacy of our method through extensive experiments conducted on the MultiWOZ dataset, highlighting improvements in response quality and correctness, largely attributed to the incorporation of the second LLM. Our code is accessible at: https://github.com/suntea233/DuetSim.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. A sequence-to-sequence model for user simulation in spoken dialogue systems. arXiv preprint arXiv:1607.00070.
  2. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  3. MultiWOZ - a large-scale multi-domain Wizard-of-Oz dataset for task-oriented dialogue modelling. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 5016–5026, Brussels, Belgium. Association for Computational Linguistics.
  4. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
  5. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
  6. GLM: General language model pretraining with autoregressive blank infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 320–335, Dublin, Ireland. Association for Computational Linguistics.
  7. User modeling for task oriented dialogues. In 2018 IEEE Spoken Language Technology Workshop (SLT), pages 900–906.
  8. End-to-end neural pipeline for goal-oriented dialogue systems using GPT-2. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 583–592, Online. Association for Computational Linguistics.
  9. Vojtěch Hudeček and Ondrej Dusek. 2023. Are large language models all you need for task-oriented dialogue? In Proceedings of the 24th Meeting of the Special Interest Group on Discourse and Dialogue, pages 216–228, Prague, Czechia. Association for Computational Linguistics.
  10. Parameter estimation for agenda-based user simulation. In Proceedings of the SIGDIAL 2010 Conference, pages 116–123, Tokyo, Japan. Association for Computational Linguistics.
  11. Neural user simulation for corpus-based policy optimisation of spoken dialogue systems. In Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, pages 60–69, Melbourne, Australia. Association for Computational Linguistics.
  12. Making language models better reasoners with step-aware verifier. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5315–5333, Toronto, Canada. Association for Computational Linguistics.
  13. Let’s verify step by step. arXiv preprint arXiv:2305.20050.
  14. GenTUS: Simulating user behaviour and language in task-oriented dialogues with generative transformers. In Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 270–282, Edinburgh, UK. Association for Computational Linguistics.
  15. Domain-independent user simulation with transformers for task-oriented dialogue systems. In Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 445–456, Singapore and Online. Association for Computational Linguistics.
  16. Lost in the middle: How language models use long contexts. arXiv preprint arXiv:2307.03172.
  17. Xiaofei Lu. 2012. The relationship of lexical richness to the quality of esl learners’ oral narratives. The Modern Language Journal, 96(2):190–208.
  18. Christopher Manning and Hinrich Schutze. 1999. Foundations of statistical natural language processing. MIT press.
  19. Philip M McCarthy and Scott Jarvis. 2010. Mtld, vocd-d, and hd-d: A validation study of sophisticated approaches to lexical diversity assessment. Behavior research methods, 42(2):381–392.
  20. Reasoning with language model prompting: A survey. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5368–5393, Toronto, Canada. Association for Computational Linguistics.
  21. Agenda-based user simulation for bootstrapping a POMDP dialogue system. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers, pages 149–152, Rochester, New York. Association for Computational Linguistics.
  22. A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies. The knowledge engineering review, 21(2):97–126.
  23. Effects of the user model on simulation-based learning of dialogue strategies. In IEEE Workshop on Automatic Speech Recognition and Understanding, 2005., pages 220–225. IEEE.
  24. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
  25. In-context learning user simulators for task-oriented dialog systems. arXiv preprint arXiv:2306.00774.
  26. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  27. Slot dependency modeling for zero-shot cross-domain dialogue state tracking. In Proceedings of the 29th International Conference on Computational Linguistics, pages 510–520, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
  28. Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems, volume 35, pages 24824–24837. Curran Associates, Inc.
  29. A network-based end-to-end trainable task-oriented dialogue system. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pages 438–449, Valencia, Spain. Association for Computational Linguistics.
  30. Trong Wu. 1993. An accurate computation of the hypergeometric distribution function. ACM Transactions on Mathematical Software (TOMS), 19(1):33–43.
  31. Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601.
  32. Hierarchical template transformer for fine-grained sentiment controllable generation. Information Processing & Management, 59(5):103048.
  33. A survey of large language models. arXiv preprint arXiv:2303.18223.
  34. Progressive-hint prompting improves reasoning in large language models. arXiv preprint arXiv:2304.09797.
  35. Solving challenging math word problems using gpt-4 code interpreter with code-based self-verification. arXiv preprint arXiv:2308.07921.
  36. ConvLab-2: An open-source toolkit for building, evaluating, and diagnosing dialogue systems. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 142–149, Online. Association for Computational Linguistics.
Citations (2)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets