Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Searching for Snippets of Open-Domain Dialogue in Task-Oriented Dialogue Datasets (2311.14076v1)

Published 23 Nov 2023 in cs.CL

Abstract: Most existing dialogue corpora and models have been designed to fit into 2 predominant categories : task-oriented dialogues portray functional goals, such as making a restaurant reservation or booking a plane ticket, while chit-chat/open-domain dialogues focus on holding a socially engaging talk with a user. However, humans tend to seamlessly switch between modes and even use chitchat to enhance task-oriented conversations. To bridge this gap, new datasets have recently been created, blending both communication modes into conversation examples. The approaches used tend to rely on adding chit-chat snippets to pre-existing, human-generated task-oriented datasets. Given the tendencies observed in humans, we wonder however if the latter do not \textit{already} hold chit-chat sequences. By using topic modeling and searching for topics which are most similar to a set of keywords related to social talk, we explore the training sets of Schema-Guided Dialogues and MultiWOZ. Our study shows that sequences related to social talk are indeed naturally present, motivating further research on ways chitchat is combined into task-oriented dialogues.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. Towards a human-like open-domain chatbot.
  2. Timothy Bickmore and Justine Cassell. 2001. Relational agents: A model and implementation of building user trust. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’01, page 396–403, New York, NY, USA. Association for Computing Machinery.
  3. Learning end-to-end goal-oriented dialog.
  4. MultiWOZ - a large-scale multi-domain Wizard-of-Oz dataset for task-oriented dialogue modelling. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 5016–5026, Brussels, Belgium. Association for Computational Linguistics.
  5. Survey on evaluation methods for dialogue systems. Artif. Intell. Rev., 54(1):755–810.
  6. Human conversational behavior. Human nature, 8(3):231–246.
  7. Maarten Grootendorst. 2022. Bertopic: Neural topic modeling with a class-based tf-idf procedure.
  8. Event-driven emotion cause extraction with corpus construction. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1639–1649, Austin, Texas. Association for Computational Linguistics.
  9. End-to-end neural pipeline for goal-oriented dialogue systems using GPT-2. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 583–592, Online. Association for Computational Linguistics.
  10. A simple language model for task-oriented dialogue.
  11. J. F. Kelley. 1984. An iterative design methodology for user-friendly natural language office information applications. ACM Trans. Inf. Syst., 2:26–41.
  12. Paul R Kroeger. 2005. Analyzing grammar: An introduction. Cambridge University Press.
  13. A study of patient clues and physician responses in primary care and surgical settings. JAMA : the journal of the American Medical Association, 284:1021–7.
  14. hdbscan: Hierarchical density based clustering. The Journal of Open Source Software, 2.
  15. Umap: Uniform manifold approximation and projection for dimension reduction.
  16. David Mimno and Andrew McCallum. 2007. Organizing the oca: Learning faceted subjects from a library of digital books. In Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL ’07, page 376–385, New York, NY, USA. Association for Computing Machinery.
  17. Recent advances in deep learning based dialogue systems: A systematic survey.
  18. Laura Ana Maria Oberländer and Roman Klinger. 2020. Token sequence labeling vs. clause classification for English emotion stimulus detection. In Proceedings of the Ninth Joint Conference on Lexical and Computational Semantics, pages 58–70, Barcelona, Spain (Online). Association for Computational Linguistics.
  19. Soloist: Building task bots at scale with transfer learning and machine teaching.
  20. Towards scalable multi-domain conversational agents: The schema-guided dialogue dataset.
  21. Byron Reeves and Clifford Nass. 1996. The media equation - how people treat computers, television, and new media like real people and places.
  22. Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks.
  23. Nils Reimers and Iryna Gurevych. 2020. Making monolingual sentence embeddings multilingual using knowledge distillation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4512–4525, Online. Association for Computational Linguistics.
  24. Open-domain conversational agents: Current progress, open problems, and future directions.
  25. Recipes for building an open-domain chatbot.
  26. Adding chit-chats to enhance task-oriented dialogues.
  27. Augmented SBERT: Data augmentation method for improving bi-encoders for pairwise sentence scoring tasks. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 296–310, Online. Association for Computational Linguistics.
  28. Kathryn R. Wentzel. 1997. Student motivation in middle school: The role of perceived pedagogical caring. Journal of Educational Psychology, 89:411–419.
  29. Fusing task-oriented and open-domain dialogues in conversational agents.
  30. Dialogpt: Large-scale generative pre-training for conversational response generation.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Armand Stricker (5 papers)
  2. Patrick Paroubek (8 papers)