Dialogue Quality and Emotion Annotations for Customer Support Conversations (2311.13910v1)
Abstract: Task-oriented conversational datasets often lack topic variability and linguistic diversity. However, with the advent of LLMs pretrained on extensive, multilingual and diverse text data, these limitations seem overcome. Nevertheless, their generalisability to different languages and domains in dialogue applications remains uncertain without benchmarking datasets. This paper presents a holistic annotation approach for emotion and conversational quality in the context of bilingual customer support conversations. By performing annotations that take into consideration the complete instances that compose a conversation, one can form a broader perspective of the dialogue as a whole. Furthermore, it provides a unique and valuable resource for the development of text classification models. To this end, we present benchmarks for Emotion Recognition and Dialogue Quality Estimation and show that further research is needed to leverage these models in a production setting.
- MultiWOZ - a large-scale multi-domain Wizard-of-Oz dataset for task-oriented dialogue modelling. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 5016–5026, Brussels, Belgium. Association for Computational Linguistics.
- Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1):37–46.
- Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451, Online. Association for Computational Linguistics.
- Paul Ekman. 1999. Basic emotions. Handbook of cognition and emotion, 98(45-60):16.
- Frames: a corpus for adding memory to goal-oriented dialogue systems. In Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, pages 207–219, Saarbrücken, Germany. Association for Computational Linguistics.
- Findings of the WMT 2022 shared task on chat translation. In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 724–743, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
- TWEETSUMM - a dialog summarization dataset for customer service. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 245–260, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- A brief survey of textual dialogue corpora. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 1264–1274, Marseille, France. European Language Resources Association.
- Classifying emotions in customer support dialogues in social media. In Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 64–73, Los Angeles. Association for Computational Linguistics.
- Radhika Jain. 2010. Investigation of governance mechanisms for crowdsourcing initiatives. In Sustainable IT Collaboration Around the Globe. 16th Americas Conference on Information Systems, AMCIS 2010, Lima, Peru, August 12-15, 2010, page 557. Association for Information Systems.
- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
- DailyDialog: A manually labelled multi-turn dialogue dataset. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 986–995, Taipei, Taiwan. Asian Federation of Natural Language Processing.
- The Ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems. In Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 285–294, Prague, Czech Republic. Association for Computational Linguistics.
- Shikib Mehri and Maxine Eskenazi. 2020. Unsupervised evaluation of interactive dialog with DialoGPT. In Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 225–235, 1st virtual meeting. Association for Computational Linguistics.
- Towards multilingual automatic open-domain dialogue evaluation. In Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Prague, Czechia. Association for Computational Linguistics.
- OpenAI. 2023. Gpt-4 technical report.
- Deconstruct to reconstruct a configurable evaluation metric for open-domain dialogue systems. In Proceedings of the 28th International Conference on Computational Linguistics, pages 4164–4178, Barcelona, Spain (Online). International Committee on Computational Linguistics.
- Emotion recognition in conversation: Research challenges, datasets, and recent advances. IEEE Access, 7:100943–100953.
- Towards scalable multi-domain conversational agents: The schema-guided dialogue dataset. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05):8689–8696.
- Alexander Schmitt and Stefan Ultes. 2015. Interaction quality: Assessing the quality of ongoing sporiaken dialog interaction by experts—and how it relates to user satisfaction. Speech Communication, 74:12–36.
- A parameterized and annotated spoken dialog corpus of the CMU let’s go bus information system. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), pages 3369–3373, Istanbul, Turkey. European Language Resources Association (ELRA).
- What makes a good conversation? how controllable attributes affect human judgments. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1702–1723, Minneapolis, Minnesota. Association for Computational Linguistics.
- Learning an unreferenced metric for online dialogue evaluation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2430–2441, Online. Association for Computational Linguistics.
- Human evaluation of conversations is an open problem: comparing the sensitivity of various methods for evaluating dialogue agents. In Proceedings of the 4th Workshop on NLP for Conversational AI, pages 77–97, Dublin, Ireland. Association for Computational Linguistics.
- PARADISE: A framework for evaluating spoken dialogue agents. In 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics, pages 271–280, Madrid, Spain. Association for Computational Linguistics.
- EnDex: Evaluation of dialogue engagingness at scale. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 4884–4893, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- mT5: A massively multilingual pre-trained text-to-text transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 483–498, Online. Association for Computational Linguistics.
- A comprehensive assessment of dialog evaluation metrics. In The First Workshop on Evaluations and Assessments of Neural Conversation Systems, pages 15–33, Online. Association for Computational Linguistics.
- Automatic evaluation and moderation of open-domain dialogue systems. ArXiv, abs/2111.02110.
- John Mendonça (9 papers)
- Patrícia Pereira (10 papers)
- Miguel Menezes (1 paper)
- Vera Cabarrão (3 papers)
- Ana C. Farinha (4 papers)
- Helena Moniz (10 papers)
- João Paulo Carvalho (8 papers)
- Alon Lavie (12 papers)
- Isabel Trancoso (26 papers)