Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dialog2Flow: Pre-training Soft-Contrastive Action-Driven Sentence Embeddings for Automatic Dialog Flow Extraction (2410.18481v2)

Published 24 Oct 2024 in cs.CL, cs.AI, and cs.LG

Abstract: Efficiently deriving structured workflows from unannotated dialogs remains an underexplored and formidable challenge in computational linguistics. Automating this process could significantly accelerate the manual design of workflows in new domains and enable the grounding of LLMs in domain-specific flowcharts, enhancing transparency and controllability. In this paper, we introduce Dialog2Flow (D2F) embeddings, which differ from conventional sentence embeddings by mapping utterances to a latent space where they are grouped according to their communicative and informative functions (i.e., the actions they represent). D2F allows for modeling dialogs as continuous trajectories in a latent space with distinct action-related regions. By clustering D2F embeddings, the latent space is quantized, and dialogs can be converted into sequences of region/action IDs, facilitating the extraction of the underlying workflow. To pre-train D2F, we build a comprehensive dataset by unifying twenty task-oriented dialog datasets with normalized per-turn action annotations. We also introduce a novel soft contrastive loss that leverages the semantic information of these actions to guide the representation learning process, showing superior performance compared to standard supervised contrastive loss. Evaluation against various sentence embeddings, including dialog-specific ones, demonstrates that D2F yields superior qualitative and quantitative results across diverse domains.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (63)
  1. Sparks of artificial general intelligence: Early experiments with GPT-4. arXiv preprint arXiv:2303.12712.
  2. MultiWOZ - a large-scale multi-domain Wizard-of-Oz dataset for task-oriented dialogue modelling. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 5016–5026, Brussels, Belgium. Association for Computational Linguistics.
  3. Taskmaster-1: Toward a realistic and diverse dialog dataset. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4516–4525, Hong Kong, China. Association for Computational Linguistics.
  4. Universal sentence encoder. Preprint, arXiv:1803.11175.
  5. Action-based conversations dataset: A corpus for building more in-depth task-oriented dialogue systems. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3002–3017, Online. Association for Computational Linguistics.
  6. Benchmarking large language models in retrieval-augmented generation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 17754–17762.
  7. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR.
  8. Semantically conditioned dialog response generation via hierarchical disentangled self-attention. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3696–3709, Florence, Italy. Association for Computational Linguistics.
  9. KETOD: Knowledge-enriched task-oriented dialogue. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 2581–2593, Seattle, United States. Association for Computational Linguistics.
  10. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
  11. Supervised learning of universal sentence representations from natural language inference data. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 670–680, Copenhagen, Denmark. Association for Computational Linguistics.
  12. Frames: a corpus for adding memory to goal-oriented dialogue systems. In Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, pages 207–219, Saarbrücken, Germany. Association for Computational Linguistics.
  13. MultiWOZ 2.1: A consolidated multi-domain dialogue dataset with state corrections and state tracking baselines. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 422–428, Marseille, France. European Language Resources Association.
  14. Kawin Ethayarajh. 2019. How contextual are contextualized word representations? Comparing the geometry of BERT, ELMo, and GPT-2 embeddings. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 55–65, Hong Kong, China. Association for Computational Linguistics.
  15. Patrícia Ferreira. 2023. Automatic dialog flow extraction and guidance. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, pages 112–122, Dubrovnik, Croatia. Association for Computational Linguistics.
  16. SimCSE: Simple contrastive learning of sentence embeddings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6894–6910, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  17. Intent induction from conversations for task-oriented dialogue track at DSTC 11. In Proceedings of The Eleventh Dialog System Technology Challenge, pages 242–259, Prague, Czech Republic. Association for Computational Linguistics.
  18. Don’t stop pretraining: Adapt language models to domains and tasks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8342–8360, Online. Association for Computational Linguistics.
  19. Workflow discovery from dialogues in the low data regime. Transactions on Machine Learning Research, 2023.
  20. Momentum contrast for unsupervised visual representation learning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9726–9735.
  21. SPACE-2: Tree-structured semi-supervised contrastive pre-training for task-oriented dialog understanding. In Proceedings of the 29th International Conference on Computational Linguistics, pages 553–569, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
  22. Olivier Henaff. 2020. Data-efficient image recognition with contrastive predictive coding. In International conference on machine learning, pages 4182–4192. PMLR.
  23. Measuring massive multitask language understanding. Proceedings of the International Conference on Learning Representations (ICLR).
  24. Measuring mathematical problem solving with the math dataset. NeurIPS.
  25. Learning deep representations by mutual information estimation and maximization. In International Conference on Learning Representations.
  26. Scaling sentence embeddings with large language models. Preprint, arXiv:2307.16645.
  27. PromptBERT: Improving BERT sentence embeddings with prompts. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 8826–8837, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  28. Daniel Jurafsky. 2006. Pragmatics and computational linguistics. The handbook of pragmatics, pages 578–604.
  29. Supervised contrastive learning. In Advances in Neural Information Processing Systems, volume 33, pages 18661–18673. Curran Associates, Inc.
  30. Skip-thought vectors. In Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc.
  31. Microsoft dialogue challenge: Building end-to-end task-completion dialogue systems. In SLT 2018.
  32. BiToD: A bilingual multi-domain dataset for task-oriented dialogue modeling. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, volume 1.
  33. DialogueCSE: Dialogue-based contrastive learning of sentence embeddings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 2396–2406, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  34. Learn to explain: Multimodal reasoning via thought chains for science question answering. In The 36th Conference on Neural Information Processing Systems (NeurIPS).
  35. Neural belief tracker: Data-driven dialogue state tracking. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1777–1788, Vancouver, Canada. Association for Computational Linguistics.
  36. Text and code embeddings by contrastive pre-training. arXiv preprint arXiv:2201.10005.
  37. Large dual encoders are generalizable retrievers. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9844–9855, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  38. OpenAI. 2024. New embedding models and api updates.
  39. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543, Doha, Qatar. Association for Computational Linguistics.
  40. Multi-domain goal-oriented dialogues (MultiDoGO): Strategies toward curating and annotating large scale dialogue data. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4526–4536, Hong Kong, China. Association for Computational Linguistics.
  41. Database search results disambiguation for task-oriented dialog systems. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1158–1173, Seattle, United States. Association for Computational Linguistics.
  42. Structure extraction in task-oriented dialogues with slot clustering. arXiv preprint arXiv:2203.00073.
  43. Structured attention for unsupervised dialogue structure induction. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1889–1899, Online. Association for Computational Linguistics.
  44. GECOR: An end-to-end generative ellipsis and co-reference resolution model for task-oriented dialogue. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4547–4557, Hong Kong, China. Association for Computational Linguistics.
  45. End-to-end learning of flowchart grounded task-oriented dialogs. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 4348–4366, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  46. Towards scalable multi-domain conversational agents: The schema-guided dialogue dataset. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 8689–8696.
  47. Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China. Association for Computational Linguistics.
  48. Bootstrapping a neural conversational agent with dialogue self-play, crowdsourcing and on-line reinforcement learning. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers), pages 41–51, New Orleans - Louisiana. Association for Computational Linguistics.
  49. SpokenWOZ: A large-scale speech-text benchmark for spoken task-oriented dialogue agents. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
  50. Prototypical networks for few-shot learning. Advances in neural information processing systems, 30.
  51. TOD-Flow: Modeling the structure of task-oriented dialogues. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 3355–3371, Singapore. Association for Computational Linguistics.
  52. Mpnet: masked and permuted pre-training for language understanding. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, Red Hook, NY, USA. Curran Associates Inc.
  53. Unsupervised extraction of dialogue policies from conversations. arXiv preprint arXiv:2406.15214.
  54. Unsupervised learning of deterministic dialogue structure with edge-enhanced graph auto-encoder. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 13869–13877.
  55. BEIR: A heterogeneous benchmark for zero-shot evaluation of information retrieval models. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2).
  56. Contrastive multiview coding. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16, pages 776–794. Springer.
  57. TOD-BERT: Pre-trained natural language understanding for task-oriented dialogue. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 917–929, Online. Association for Computational Linguistics.
  58. MultiWOZ 2.2 : A dialogue dataset with additional annotation corrections and state tracking baselines. In Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI, pages 109–117, Online. Association for Computational Linguistics.
  59. Pairwise supervised contrastive learning of sentence representations. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5786–5798, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  60. Virtual augmentation supported contrastive learning of sentence representations. In Findings of the Association for Computational Linguistics: ACL 2022, pages 864–876, Dublin, Ireland. Association for Computational Linguistics.
  61. DialogStudio: Towards richest and most diverse unified dataset collection for conversational AI. In Findings of the Association for Computational Linguistics: EACL 2024, pages 2299–2315, St. Julian’s, Malta. Association for Computational Linguistics.
  62. DialoGPT : Large-scale generative pre-training for conversational response generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 270–278, Online. Association for Computational Linguistics.
  63. Learning dialogue representations from consecutive utterances. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 754–768, Seattle, United States. Association for Computational Linguistics.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com