Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Many Hands Make Light Work: Task-Oriented Dialogue System with Module-Based Mixture-of-Experts (2405.09744v1)

Published 16 May 2024 in cs.CL and cs.AI

Abstract: Task-oriented dialogue systems are broadly used in virtual assistants and other automated services, providing interfaces between users and machines to facilitate specific tasks. Nowadays, task-oriented dialogue systems have greatly benefited from pre-trained LLMs (PLMs). However, their task-solving performance is constrained by the inherent capacities of PLMs, and scaling these models is expensive and complex as the model size becomes larger. To address these challenges, we propose Soft Mixture-of-Expert Task-Oriented Dialogue system (SMETOD) which leverages an ensemble of Mixture-of-Experts (MoEs) to excel at subproblems and generate specialized outputs for task-oriented dialogues. SMETOD also scales up a task-oriented dialogue system with simplicity and flexibility while maintaining inference efficiency. We extensively evaluate our model on three benchmark functionalities: intent prediction, dialogue state tracking, and dialogue response generation. Experimental results demonstrate that SMETOD achieves state-of-the-art performance on most evaluated metrics. Moreover, comparisons against existing strong baselines show that SMETOD has a great advantage in the cost of inference and correctness in problem-solving.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Task-optimized adapters for an end-to-end task-oriented dialogue system. arXiv preprint arXiv:2305.02468, 2023.
  2. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  3. Hello, it’s gpt-2–how can i help you? towards the use of pretrained language models for task-oriented dialogue systems. arXiv preprint arXiv:1907.05774, 2019.
  4. Efficient intent detection with dual sentence encoders. arXiv preprint arXiv:2003.04807, 2020.
  5. Efficient task-oriented dialogue systems with response selection as an auxiliary task. arXiv preprint arXiv:2208.07097, 2022.
  6. Preview, attend and review: Schema-aware curriculum learning for multi-domain dialog state tracking. arXiv preprint arXiv:2106.00291, 2021.
  7. A copy-augmented sequence-to-sequence architecture gives good performance on task-oriented dialogue. arXiv preprint arXiv:1701.04024, 2017.
  8. Multiwoz 2.1: Multi-domain dialogue state corrections and state tracking baselines. arXiv preprint arXiv:1907.01669, 2019.
  9. Fantastic rewards and how to tame them: A case study on reward learning for task-oriented dialogue systems. arXiv preprint arXiv:2302.10342, 2023.
  10. A form-based dialogue manager for spoken language applications. In Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP ’96, volume 2, pages 701–704 vol.2, 1996.
  11. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), December 2015.
  12. SPACE-2: Tree-structured semi-supervised contrastive pre-training for task-oriented dialog understanding. In Proceedings of the 29th International Conference on Computational Linguistics, pages 553–569, Gyeongju, Republic of Korea, 2022. International Committee on Computational Linguistics.
  13. Unified dialog model pre-training for task-oriented dialog understanding and generation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 187–200, 2022.
  14. Galaxy: A generative pre-trained model for task-oriented dialog with semi-supervised learning and explicit policy injection. In Proceedings of the AAAI conference on artificial intelligence, volume 36, pages 10749–10757, 2022.
  15. Trippy: A triple copy strategy for value independent neural dialog state tracking. arXiv preprint arXiv:2005.02877, 2020.
  16. A simple language model for task-oriented dialogue. Advances in Neural Information Processing Systems, 33:20179–20191, 2020.
  17. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pages 2790–2799. PMLR, 2019.
  18. In-context learning for few-shot dialogue state tracking. arXiv preprint arXiv:2203.08568, 2022.
  19. Domain state tracking for a simplified dialogue system. arXiv preprint arXiv:2103.06648, 2021.
  20. An evaluation dataset for intent classification and out-of-scope prediction. pages 1311–1316. Association for Computational Linguistics, 2019.
  21. Dialogue state tracking with a language model using schema-driven prompting. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 4937–4949, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics.
  22. Yohan Lee. Improving end-to-end task-oriented dialog system with a simple auxiliary task. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 1296–1303, 2021.
  23. Gshard: Scaling giant models with conditional computation and automatic sharding. arXiv preprint arXiv:2006.16668, 2020.
  24. The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691, 2021.
  25. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190, 2021.
  26. Mintl: Minimalist transfer learning for task-oriented dialogue systems. arXiv preprint arXiv:2009.12005, 2020.
  27. End-to-end learning of task-oriented dialogs. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, pages 67–73, 2018.
  28. Benchmarking natural language understanding services for building conversational agents. arXiv preprint arXiv:1903.05566, 2019.
  29. Dialoglue: A natural language understanding benchmark for task-oriented dialogue. arXiv preprint arXiv:2009.13570, 2020.
  30. Parameter-efficient fine-tuning method for task-oriented dialogue systems. Mathematics, 11(14):3048, 2023.
  31. Soloist: Building task bots at scale with transfer learning and machine teaching. Transactions of the Association for Computational Linguistics, 9:807–824, 2021.
  32. From sparse to soft mixtures of experts. arXiv preprint arXiv:2308.00951, 2023.
  33. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  34. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67, 2020.
  35. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140):1–67, 2020.
  36. Adapterdrop: On the efficiency of adapters in transformers. arXiv preprint arXiv:2010.11918, 2020.
  37. Adafactor: Adaptive learning rates with sublinear memory cost. In International Conference on Machine Learning, pages 4596–4604. PMLR, 2018.
  38. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. In International Conference on Learning Representations, 2017.
  39. Mesh-tensorflow: Deep learning for supercomputers. Advances in neural information processing systems, 31, 2018.
  40. Continuously learning neural dialogue management. arXiv preprint arXiv:1606.02689, 2016.
  41. Multi-task pre-training for plug-and-play task-oriented dialogue system. 2022.
  42. Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems, 12, 1999.
  43. Is your goal-oriented dialog model performing really well? empirical analysis of system-wise evaluation. arXiv preprint arXiv:2005.07362, 2020.
  44. Bayesian update of dialogue state: A pomdp framework for spoken dialogue systems. Computer Speech & Language, 24(4):562–588, 2010.
  45. Amendable generation for dialogue state tracking. arXiv preprint arXiv:2110.15659, 2021.
  46. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  47. A network-based end-to-end trainable task-oriented dialogue system. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pages 438–449. Association for Computational Linguistics, 2017.
  48. Hybrid code networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning. arXiv preprint arXiv:1702.03274, 2017.
  49. Transferable multi-domain state generator for task-oriented dialogue systems. arXiv preprint arXiv:1905.08743, 2019.
  50. TOD-BERT: Pre-trained natural language understanding for task-oriented dialogue. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 917–929. Association for Computational Linguistics, 2020.
  51. Diacttod: Learning generalizable latent dialogue acts for controllable task-oriented dialogue systems. arXiv preprint arXiv:2308.00878, 2023.
  52. Ubar: Towards fully end-to-end task-oriented dialog system with gpt-2. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 14230–14238, 2021.
  53. Krls: Improving end-to-end response generation in task oriented dialog with reinforced keywords learning. arXiv preprint arXiv:2211.16773, 2022.
  54. Multiwoz 2.2: A dialogue dataset with additional annotation corrections and state tracking baselines. In Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI, ACL 2020, pages 109–117, 2020.
  55. DIALOGPT : Large-scale generative pre-training for conversational response generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 270–278. Association for Computational Linguistics, 2020.
  56. Description-driven task-oriented dialog modeling. arXiv preprint arXiv:2201.08904, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Ruolin Su (7 papers)
  2. Biing-Hwang Juang (8 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets