RSVP: Customer Intent Detection via Agent Response Contrastive and Generative Pre-Training (2310.09773v1)
Abstract: The dialogue systems in customer services have been developed with neural models to provide users with precise answers and round-the-clock support in task-oriented conversations by detecting customer intents based on their utterances. Existing intent detection approaches have highly relied on adaptively pre-training LLMs with large-scale datasets, yet the predominant cost of data collection may hinder their superiority. In addition, they neglect the information within the conversational responses of the agents, which have a lower collection cost, but are significant to customer intent as agents must tailor their replies based on the customers' intent. In this paper, we propose RSVP, a self-supervised framework dedicated to task-oriented dialogues, which utilizes agent responses for pre-training in a two-stage manner. Specifically, we introduce two pre-training tasks to incorporate the relations of utterance-response pairs: 1) Response Retrieval by selecting a correct response from a batch of candidates, and 2) Response Generation by mimicking agents to generate the response to a given utterance. Our benchmark results for two real-world customer service datasets show that RSVP significantly outperforms the state-of-the-art baselines by 4.95% for accuracy, 3.4% for MRR@3, and 2.75% for MRR@5 on average. Extensive case studies are investigated to show the validity of incorporating agent responses into the pre-training stage.
- Joint intent detection and slot filling using weighted finite state transducer and bert. Applied Intelligence, pages 1–15.
- A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 632–642, Lisbon, Portugal. Association for Computational Linguistics.
- MultiWOZ - a large-scale multi-domain Wizard-of-Oz dataset for task-oriented dialogue modelling. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 5016–5026, Brussels, Belgium. Association for Computational Linguistics.
- Efficient intent detection with dual sentence encoders. In Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI, pages 38–45, Online. Association for Computational Linguistics.
- Driven answer generation for product-related questions in e-commerce. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pages 411–419.
- Where to start? analyzing the potential value of intermediate models. CoRR, abs/2211.00107.
- Robustification of multilingual language models to real-world noise in crosslingual zero-shot settings with robust contrastive pretraining. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 1375–1391, Dubrovnik, Croatia. Association for Computational Linguistics.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
- A novel bi-directional interrelated model for joint intent detection and slot filling. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5467–5471, Florence, Italy. Association for Computational Linguistics.
- Frames: a corpus for adding memory to goal-oriented dialogue systems. In Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, pages 207–219, Saarbrücken, Germany. Association for Computational Linguistics.
- SimCSE: Simple contrastive learning of sentence embeddings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6894–6910, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Aritra Ghosh and Andrew Lan. 2021. Contrastive learning improves model robustness under label noise. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 2703–2708.
- Knowledge is a region in weight space for fine-tuned language models. CoRR, abs/2302.04863.
- Supervised contrastive learning for pre-trained language model fine-tuning. In International Conference on Learning Representations.
- A repository of conversational datasets. In Proceedings of the First Workshop on NLP for Conversational AI, pages 1–10, Florence, Italy. Association for Computational Linguistics.
- ConveRT: Efficient and accurate conversational representations from transformers. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 2161–2174, Online. Association for Computational Linguistics.
- Improving neural networks by preventing co-adaptation of feature detectors. CoRR, abs/1207.0580.
- Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781, Online. Association for Computational Linguistics.
- An evaluation dataset for intent classification and out-of-scope prediction. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1311–1316, Hong Kong, China. Association for Computational Linguistics.
- Hector J. Levesque. 2011. The winograd schema challenge. In Logical Formalizations of Commonsense Reasoning, Papers from the 2011 AAAI Spring Symposium, Technical Report SS-11-06, Stanford, California, USA, March 21-23, 2011. AAAI.
- Sharon Levy and William Yang Wang. 2020. Cross-lingual transfer learning for COVID-19 outbreak alignment. In Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020, Online. Association for Computational Linguistics.
- Mixkd: Towards efficient distillation of large-scale language models. In ICLR. OpenReview.net.
- Fast, effective, and self-supervised: Transforming masked language models into universal lexical and sentence encoders. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1442–1459, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Review of intent detection methods in the human-machine dialogue system. In Journal of Physics: Conference Series, volume 1267, page 012059. IOP Publishing.
- Benchmarking natural language understanding services for building conversational agents. arXiv preprint arXiv:1903.05566.
- Ilya Loshchilov and Frank Hutter. 2017. Fixing weight decay regularization in adam. CoRR, abs/1711.05101.
- Shikib Mehri and Mihail Eric. 2021. Example-driven intent prediction with observers. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2979–2992, Online. Association for Computational Linguistics.
- Dialoglue: A natural language understanding benchmark for task-oriented dialogue. arXiv preprint arXiv:2009.13570.
- Dialog intent induction with deep multi-view clustering. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4016–4025, Hong Kong, China. Association for Computational Linguistics.
- Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67.
- Towards scalable multi-domain conversational agents: The schema-guided dialogue dataset. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 8689–8696.
- A two-stage user intent detection model on complicated utterances with multi-task learning. In Companion Proceedings of the Web Conference 2022, pages 197–200.
- Get the point of my utterance! learning towards effective responses with multi-head attention mechanism. In IJCAI, pages 4418–4424.
- Laurens van der Maaten and Geoffrey E. Hinton. 2012. Visualizing non-metric similarities in multiple maps. Mach. Learn., 87(1):33–55.
- Interpolation consistency training for semi-supervised learning. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pages 3635–3641. International Joint Conferences on Artificial Intelligence Organization.
- ConvFiT: Conversational fine-tuning of pretrained language models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1151–1168, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 353–355, Brussels, Belgium. Association for Computational Linguistics.
- A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112–1122, New Orleans, Louisiana. Association for Computational Linguistics.
- Transformers: State-of-the-art natural language processing. In EMNLP (Demos), pages 38–45. Association for Computational Linguistics.
- TOD-BERT: Pre-trained natural language understanding for task-oriented dialogue. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 917–929, Online. Association for Computational Linguistics.
- R-drop: Regularized dropout for neural networks. Advances in Neural Information Processing Systems, 34:10890–10905.
- Curriculum learning for natural language understanding. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6095–6104, Online. Association for Computational Linguistics.
- mT5: A massively multilingual pre-trained text-to-text transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 483–498, Online. Association for Computational Linguistics.
- QAID: question answering inspired few-shot intent detection. CoRR, abs/2303.01593.
- MultiWOZ 2.2 : A dialogue dataset with additional annotation corrections and state tracking baselines. In Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI, pages 109–117, Online. Association for Computational Linguistics.
- Joint slot filling and intent detection via capsule neural networks. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5259–5267, Florence, Italy. Association for Computational Linguistics.
- Fine-tuning pre-trained language models for few-shot intent detection: Supervised pre-training and isotropization. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 532–542, Seattle, United States. Association for Computational Linguistics.
- Effectiveness of pre-training for few-shot intent classification. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 1114–1120, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Few-shot intent detection via contrastive pre-training and fine-tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1906–1912, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Discriminative nearest neighbor few-shot intent detection by transferring natural language inference. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5064–5082, Online. Association for Computational Linguistics.
- Yu-Chien Tang (6 papers)
- Wei-Yao Wang (27 papers)
- An-Zi Yen (13 papers)
- Wen-Chih Peng (47 papers)