Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Beyond the Known: Investigating LLMs Performance on Out-of-Domain Intent Detection (2402.17256v2)

Published 27 Feb 2024 in cs.CL

Abstract: Out-of-domain (OOD) intent detection aims to examine whether the user's query falls outside the predefined domain of the system, which is crucial for the proper functioning of task-oriented dialogue (TOD) systems. Previous methods address it by fine-tuning discriminative models. Recently, some studies have been exploring the application of LLMs represented by ChatGPT to various downstream tasks, but it is still unclear for their ability on OOD detection task.This paper conducts a comprehensive evaluation of LLMs under various experimental settings, and then outline the strengths and weaknesses of LLMs. We find that LLMs exhibit strong zero-shot and few-shot capabilities, but is still at a disadvantage compared to models fine-tuned with full resource. More deeply, through a series of additional analysis experiments, we discuss and summarize the challenges faced by LLMs and provide guidance for future work including injecting domain knowledge, strengthening knowledge transfer from IND(In-domain) to OOD, and understanding long instructions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Abhijit Bendale and Terrance E. Boult. 2016. Towards open set deep networks. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1563–1572.
  2. Efficient intent detection with dual sentence encoders. arXiv preprint arXiv:2003.04807.
  3. The use of user modelling to guide inference and learning. Applied Intelligence, 2(1):37–53.
  4. Bert for joint intent classification and slot filling. arXiv preprint arXiv:1902.10909.
  5. Probing out-of-distribution robustness of language models with parameter-efficient transfer learning.
  6. A survey for in-context learning. arXiv preprint arXiv:2301.00234.
  7. Geli Fei and Bing Liu. 2016. Breaking the closed world assumption in text classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 506–514.
  8. Dan Hendrycks and Kevin Gimpel. 2017. A baseline for detecting misclassified and out-of-distribution examples in neural networks. ArXiv, abs/1610.02136.
  9. Is chatgpt a good translator? yes with gpt-4 as the engine.
  10. Joo-Kyung Kim and Young-Bum Kim. 2018. Joint learning of domain classification and out-of-domain detection with dynamic class weighting for satisficing false acceptance rates. ArXiv, abs/1807.00072.
  11. An evaluation dataset for intent classification and out-of-scope prediction. In EMNLP/IJCNLP.
  12. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. ArXiv, abs/1807.03888.
  13. Ting-En Lin and Hua Xu. 2019. Deep unknown intent detection with margin loss. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5491–5496.
  14. Decoupling pseudo label disambiguation and representation learning for generalized intent discovery. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9661–9675, Toronto, Canada. Association for Computational Linguistics.
  15. Uninl: Aligning representation learning with scoring function for ood detection via unified neighborhood learning. ArXiv, abs/2210.10722.
  16. Training language models to follow instructions with human feedback. ArXiv, abs/2203.02155.
  17. A preliminary evaluation of chatgpt for zero-shot dialogue understanding.
  18. Likelihood ratios for out-of-distribution detection. ArXiv, abs/1906.02845.
  19. Doc: Deep open classification of text documents. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2911–2916.
  20. Large language models meet open-world intent discovery and recognition: An evaluation of ChatGPT. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 10291–10304, Singapore. Association for Computational Linguistics.
  21. Continual generalized intent discovery: Marching towards dynamic and open-world intent recognition. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 4370–4382, Singapore. Association for Computational Linguistics.
  22. Llama: Open and efficient foundation language models. ArXiv, abs/2302.13971.
  23. Amrita S Tulshan and Sudhir Namdeorao Dhage. 2018. Survey on virtual assistant: Google assistant, siri, cortana, alexa. In International symposium on signal processing and intelligent recognition systems, pages 190–201.
  24. On the robustness of chatgpt: An adversarial and out-of-distribution perspective.
  25. APP: Adaptive prototypical pseudo-labeling for few-shot OOD detection. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 3926–3939, Singapore. Association for Computational Linguistics.
  26. Zero-shot information extraction via chatting with chatgpt.
  27. Revisit overconfidence for ood detection: Reassigned contrastive learning with adaptive class-dependent threshold. In NAACL.
  28. Distribution calibration for out-of-domain detection with bayesian approximation. In International Conference on Computational Linguistics.
  29. A deep generative distance-based classifier for out-of-domain detection with mahalanobis space. In Proceedings of the 28th International Conference on Computational Linguistics, pages 1452–1460, Barcelona, Spain (Online). International Committee on Computational Linguistics.
  30. Evaluation of chatgpt and microsoft bing ai chat performances on physics exams of vietnamese national high school graduation examination.
  31. Generalized out-of-distribution detection: A survey. arXiv preprint arXiv:2110.11334.
  32. Exploring the limits of chatgpt for query or aspect-based text summarization.
  33. Semi-supervised knowledge-grounded pre-training for task-oriented dialog systems. In Proceedings of the Towards Semi-Supervised and Reinforced Task-Oriented Dialog Systems (SereTOD), pages 39–47, Abu Dhabi, Beijing (Hybrid). Association for Computational Linguistics.
  34. Modeling discriminative representations for out-of-domain detection with supervised contrastive learning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 870–878, Online. Association for Computational Linguistics.
  35. Out-of-domain detection for natural language understanding in dialog systems. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28:1198–1209.
  36. KNN-contrastive learning for out-of-domain intent classification. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5129–5141, Dublin, Ireland. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Pei Wang (240 papers)
  2. Keqing He (47 papers)
  3. Yejie Wang (15 papers)
  4. Xiaoshuai Song (16 papers)
  5. Yutao Mou (16 papers)
  6. Jingang Wang (71 papers)
  7. Yunsen Xian (17 papers)
  8. Xunliang Cai (63 papers)
  9. Weiran Xu (58 papers)
Citations (3)
X Twitter Logo Streamline Icon: https://streamlinehq.com