Multimodal Contextual Dialogue Breakdown Detection for Conversational AI Models (2404.08156v1)
Abstract: Detecting dialogue breakdown in real time is critical for conversational AI systems, because it enables taking corrective action to successfully complete a task. In spoken dialog systems, this breakdown can be caused by a variety of unexpected situations including high levels of background noise, causing STT mistranscriptions, or unexpected user flows. In particular, industry settings like healthcare, require high precision and high flexibility to navigate differently based on the conversation history and dialogue states. This makes it both more challenging and more critical to accurately detect dialog breakdown. To accurately detect breakdown, we found it requires processing audio inputs along with downstream NLP model inferences on transcribed text in real time. In this paper, we introduce a Multimodal Contextual Dialogue Breakdown (MultConDB) model. This model significantly outperforms other known best models by achieving an F1 of 69.27.
- Dialogue breakdown detection based on nonlinguistic acoustic information. In 2018 IEEE 7th Global Conference on Consumer Electronics (GCCE), pages 689–690.
- wav2vec 2.0: a framework for self-supervised learning of speech representations. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, Red Hook, NY, USA. Curran Associates Inc.
- Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
- LSTM for Dialogue Breakdown Detection: Exploration of Different Model Types and Word Embeddings, pages 443–453. Springer Singapore, Singapore.
- The dialogue breakdown detection challenge: Task description, datasets, and evaluation metrics. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pages 3146–3150, Portorož, Slovenia. European Language Resources Association (ELRA).
- Overview of the sixth dialog system technology challenge: Dstc6. Computer Speech & Language, 55:1–25.
- Multi-modal repairs of conversational breakdowns in task-oriented dialogs. Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology.
- What makes good in-context examples for GPT-3? In Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, pages 100–114, Dublin, Ireland and Online. Association for Computational Linguistics.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
- Cutting down on prompts and parameters: Simple few-shot learning with language models. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2824–2835.
- Automatic detection of miscommunication in spoken dialogue systems. In Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 354–363, Prague, Czech Republic. Association for Computational Linguistics.
- Hierarchical fusion for online multimodal dialog act classification. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 7532–7545, Singapore. Association for Computational Linguistics.
- Noisy channel language model prompting for few-shot text classification. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5316–5330.
- Rethinking the role of demonstrations: What makes in-context learning work? pages 11048–11064.
- Predicting dialogue breakdown in conversational pedagogical agents with multimodal lstms. In Artificial Intelligence in Education, pages 195–200, Cham. Springer International Publishing.
- Improving dialogue breakdown detection with semi-supervised learning. ArXiv, abs/2011.00136.
- Glove: Global vectors for word representation. In Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543.
- Robust speech recognition via large-scale weak supervision. In International Conference on Machine Learning, pages 28492–28518. PMLR.
- The INTERSPEECH 2009 emotion challenge. In Proc. Interspeech 2009, pages 312–315.
- Hiroaki Sugiyama. 2021. Dialogue breakdown detection using bert with traditional dialogue features. In Increasing Naturalness and Flexibility in Spoken Dialogue Interaction: 10th International Workshop on Spoken Dialogue Systems, pages 419–427. Springer.
- Multimodal transformer for unaligned multimodal language sequences. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 6558–6569, Florence, Italy. Association for Computational Linguistics.
- Dialog breakdown detection using multimodal features for non-task-oriented dialog systems. In 2022 IEEE 11th Global Conference on Consumer Electronics (GCCE), pages 352–356.
- Attention is all you need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
- Calibrate before use: Improving few-shot performance of language models. In International Conference on Machine Learning.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days freePaper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.