Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Survey on Out-of-Distribution Detection in NLP (2305.03236v2)

Published 5 May 2023 in cs.CL, cs.AI, and cs.LG

Abstract: Out-of-distribution (OOD) detection is essential for the reliable and safe deployment of machine learning systems in the real world. Great progress has been made over the past years. This paper presents the first review of recent advances in OOD detection with a particular focus on natural language processing approaches. First, we provide a formal definition of OOD detection and discuss several related fields. We then categorize recent algorithms into three classes according to the data they used: (1) OOD data available, (2) OOD data unavailable + in-distribution (ID) label available, and (3) OOD data unavailable + ID label unavailable. Third, we introduce datasets, applications, and metrics. Finally, we summarize existing work and present potential future research topics.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (136)
  1. Survey on anomaly detection using data mining techniques. Procedia Computer Science, 2015.
  2. Concrete problems in ai safety. arXiv preprint arXiv:1606.06565, 2016.
  3. Types of out-of-distribution texts and how to detect them. In EMNLP, 2021.
  4. Feed two birds with one scone: Exploiting wild data for both out-of-distribution generalization and detection. In ICML, 2023.
  5. Towards open world recognition. In CVPR, 2015.
  6. Towards open set deep networks. In CVPR, 2016.
  7. Experience report: Log mining using natural language processing and application to anomaly detection. In ISSRE, 2017.
  8. Domain adaptation with structural correspondence learning. In EMNLP, 2006.
  9. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
  10. Lof: identifying density-based local outliers. In SIGMOD, 2000.
  11. Language models are few-shot learners. In NeurIPS, 2020.
  12. Efficient intent detection with dual sentence encoders. In Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI, 2020.
  13. Deep learning for anomaly detection: A survey. arXiv preprint arXiv:1901.03407, 2019.
  14. Gold: Improving out-of-scope detection in dialogues using data augmentation. In EMNLP, 2021.
  15. Enhanced lstm for natural language inference. In ACL, 2017.
  16. Holistic sentence embeddings for better out-of-distribution detection. In EMNLP, 2022.
  17. Enhancing out-of-distribution detection in natural language understanding via implicit layer ensemble. In EMNLP, 2022.
  18. Outflip: Generating examples for unknown intent detection with natural language attack. In ACL-IJCNLP, 2021.
  19. Chataug: Leveraging chatgpt for text data augmentation. arXiv preprint arXiv:2302.13007, 2023a.
  20. Exploring large language models for multi-modal out-of-distribution detection. In EMNLP, 2023b.
  21. On the effects of transformer size on in-and out-of-domain calibration. In EMNLP, 2021.
  22. The relationship between precision-recall and roc curves. In ICML, 2006.
  23. Calibration of pre-trained transformers. In EMNLP, 2020.
  24. Unknown-aware object detection: Learning what you don’t know from videos in the wild. In CVPR, 2022a.
  25. Vos: Learning what you don’t know by virtual outlier synthesis. In ICLR, 2022b.
  26. Dream the impossible: Outlier imagination with diffusion models. In NeurIPS, 2023.
  27. Barle: Background-aware representation learning for background shift out-of-distribution detection. In EMNLP, 2022.
  28. Is out-of-distribution detection learnable? In NeurIPS, 2022.
  29. Breaking the closed world assumption in text classification. In NAACL, 2016.
  30. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In ICML, 2016.
  31. Likelihood ratios and generative classifiers for unsupervised out-of-domain detection in task oriented dialog. In AAAI, 2020.
  32. Selective classification for deep neural networks. In NeurIPS, 2017.
  33. Generative adversarial networks. Communications of the ACM, 63(11), 2020.
  34. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
  35. Recent advances in convolutional neural networks. Pattern recognition, 77, 2018.
  36. Statistical analysis of nearest neighbor methods for anomaly detection. In NeurIPS, 2019.
  37. Supervised contrastive learning for pre-trained language model fine-tuning. In ICLR, 2021.
  38. Out-of-distribution detection in unsupervised continual learning. In CVPR, 2022.
  39. A baseline for detecting misclassified and out-of-distribution examples in neural networks. In ICLR, 2017.
  40. Deep anomaly detection with outlier exposure. In ICLR, 2019a.
  41. Using self-supervised learning can improve model robustness and uncertainty. In NeurIPS, 2019b.
  42. Pretrained transformers improve out-of-distribution robustness. In ACL, 2020.
  43. Long short-term memory. Neural computation, 9(8), 1997.
  44. Selective question answering under domain shift. In ACL, 2020.
  45. Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT, 2019.
  46. Supervised contrastive learning. In NeurIPS, 2020.
  47. Continual learning based on ood detection and task masking. In CVPR, 2022.
  48. Pseudo outlier exposure for out-of-distribution detection using pretrained transformers. In ACL, 2023.
  49. Joint learning of domain classification and out-of-domain detection with dynamic class weighting for satisficing false acceptance rates. arXiv preprint arXiv:1807.00072, 2018.
  50. Wilds: A benchmark of in-the-wild distribution shifts. In ICML, 2021.
  51. Internet-augmented dialogue generation. In ACL, 2022.
  52. Calibrated language model fine-tuning for in-and out-of-distribution data. In EMNLP, 2020.
  53. Calibration of encoder decoder models for neural machine translation. arXiv preprint arXiv:1903.00802, 2019.
  54. Simple and scalable predictive uncertainty estimation using deep ensembles. In NeurIPS, 2017.
  55. Estimating soft labels for out-of-domain intent detection. In EMNLP, 2022.
  56. Out-of-domain intent detection considering multi-turn dialogue contexts. arXiv preprint arXiv:2305.03237, 2023.
  57. An evaluation dataset for intent classification and out-of-scope prediction. In EMNLP-IJCNLP, 2019.
  58. Training confidence-calibrated classifiers for detecting out-of-distribution samples. In ICLR, 2018a.
  59. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In NeurIPS, 2018b.
  60. Learning to classify texts using positive and unlabeled data. In IJCAI, 2003.
  61. kfolden: k-fold ensemble for out-of-distribution detection-fold ensemble for out-of-distribution detection. In EMNLP, 2021.
  62. Enhancing the reliability of out-of-distribution image detection in neural networks. In ICLR, 2018.
  63. Deep unknown intent detection with margin loss. In ACL, 2019.
  64. Lifelong and continual learning dialogue systems: learning during conversation. In AAAI, 2021.
  65. Energy-based out-of-distribution detection. In NeurIPS, 2020.
  66. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  67. Detecting compositionally out-of-distribution examples in semantic parsing. In EMNLP, 2021.
  68. Foundations of statistical natural language processing. MIT press, 1999.
  69. Oodgan: Generative adversarial network for out-of-domain data generation. In NAACL, 2021.
  70. Visual classification via description from large language models. In ICLR, 2023.
  71. How does fine-tuning impact out-of-distribution detection for vision-language models? International Journal of Computer Vision, 2023.
  72. Delving into out-of-distribution detection with vision-language representations. In NeurIPS, 2022.
  73. Provable guarantees for understanding out-of-distribution detection. In AAAI, 2022.
  74. Star: A schema-guided dialog dataset for transfer learning. arXiv preprint arXiv:2010.11853, 2020.
  75. Uninl: Aligning representation learning with scoring function for ood detection via unified neighborhood learning. In EMNLP, 2022.
  76. A framework for anomaly detection using language modeling, and its applications to finance. arXiv preprint arXiv:1908.09156, 2019.
  77. Energy-based unknown intent detection with data manipulation. In ACL-IJCNLP, 2021.
  78. On prefix-tuning for lightweight out-of-distribution detection. In ACL, 2023.
  79. Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift. In NeurIPS, 2019.
  80. Deep learning for anomaly detection: A review. ACM Computing Surveys (CSUR), 54(2), 2021.
  81. An empirical analysis of formality in online communication. TACL, 2016.
  82. Language models as knowledge bases? In EMNLP-IJCNLP, 2019.
  83. Revisiting mahalanobis distance for transformer-based out-of-domain detection. In AAAI, 2021.
  84. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 2020.
  85. Neural unsupervised domain adaptation in nlp—a survey. In COLING, 2020.
  86. Pnpood: Out-of-distribution detection for text classification via plug andplay data augmentation. arXiv preprint arXiv:2111.00506, 2021.
  87. Likelihood ratios for out-of-distribution detection. In NeurIPS, 2019.
  88. Out-of-distribution detection and selective generation for conditional language models. In ICLR, 2023.
  89. High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
  90. Transfer learning in natural language processing. In NAACL: Tutorials, June 2019.
  91. Neural sentence embedding using only in-domain sentences for out-of-domain sentence detection in dialog systems. Pattern Recognition Letters, 2017.
  92. Out-of-domain detection based on generative adversarial network. In EMNLP, 2018.
  93. Probability models for open set recognition. TPAMI, 2014.
  94. Cross-lingual transfer learning for multilingual task oriented dialog. In NAACL, 2019.
  95. Enhancing the generalization for intent classification and out-of-domain detection in slu. In ACL, 2021.
  96. Doc: Deep open classification of text documents. In EMNLP, 2017.
  97. Odist: Open world classification via distributionally shifted instances. In EMNLP, 2021.
  98. Prompting gpt-3 to be reliable. In ICLR, 2023.
  99. Out-of-distribution detection with deep nearest neighbors. In ICML, 2022.
  100. Out-of-domain detection for low-resource text classification tasks. In EMNLP-IJCNLP, 2019.
  101. Non-parametric outlier synthesis. In ICLR, 2023.
  102. Is fine-tuning needed? pre-trained language models are near perfect for out-of-domain detection. In ACL, 2023.
  103. Vladimir Vapnik. Principles of risk minimization for learning theory. In NeurIPS, 1991.
  104. Investigating selective prediction approaches across several tasks in IID, OOD, and adversarial settings. In ACL, 2022.
  105. Attention is all you need. In NeurIPS, 2017.
  106. Efficient out-of-domain detection for sequence to sequence models. In ACL, 2023.
  107. Manifold mixup: Better representations by interpolating hidden states. In ICML, 2019.
  108. A perspective view and survey of meta-learning. Artificial intelligence review, 18(2), 2002.
  109. Out-of-distribution detection using an ensemble of self supervised leave-out classifiers. In ECCV, 2018.
  110. Generalizing to unseen domains: A survey on domain generalization. TKDE, 2022.
  111. A survey of zero-shot learning: Settings, methods, and applications. TIST, 2019.
  112. Multi-level knowledge distillation for out-of-distribution detection in text. In ACL, 2023.
  113. Revisit overconfidence for OOD detection: Reassigned contrastive learning with adaptive class-dependent threshold. In NAACL, 2022.
  114. The art of abstention: Selective prediction and error regularization for natural language processing. In ACL, 2021.
  115. Open-world learning and application to product classification. In WWW, 2019.
  116. Short text clustering via convolutional neural networks. In Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, 2015.
  117. Unsupervised out-of-domain detection via pre-trained transformers. In ACL, 2021.
  118. A survey on log anomaly detection using deep learning. In ICRITO, 2020.
  119. Unknown intent detection using gaussian mixture model with an application to zero-shot intent classification. In ACL, 2020.
  120. Generalized out-of-distribution detection: A survey. arXiv preprint arXiv:2110.11334, 2021.
  121. Openood: Benchmarking generalized out-of-distribution detection. In NeurIPS, 2022.
  122. Seqgan: Sequence generative adversarial nets with policy gradient. In AAAI, 2017.
  123. Modeling discriminative representations for out-of-domain detection with supervised contrastive learning. In ACL, 2021a.
  124. Adversarial generative distance-based classifier for robust out-of-domain detection. In ICASSP, 2021b.
  125. Out-of-scope intent detection with self-supervision and discriminative training. In ACL, 2021.
  126. Learning from positive and unlabeled examples: A survey. In International Symposiums on Information Processing, 2008.
  127. Deep open intent classification with adaptive decision boundary. In AAAI, 2021.
  128. mixup: Beyond empirical risk minimization. In ICLR, 2018.
  129. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In ICML, 2020.
  130. Out-of-domain detection for natural language understanding in dialog systems. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020.
  131. Learning placeholders for open-set recognition. In CVPR, 2021a.
  132. Domain generalization: A survey. TPAMI, 2022a.
  133. Contrastive out-of-distribution detection for pretrained transformers. In EMNLP, 2021b.
  134. KNN-contrastive learning for out-of-domain intent classification. In ACL, 2022b.
  135. Two birds one stone: Dynamic ensemble for ood intent classification. In ACL, 2023.
  136. Deep autoencoding gaussian mixture model for unsupervised anomaly detection. In ICLR, 2018.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Hao Lang (10 papers)
  2. Yinhe Zheng (30 papers)
  3. Yixuan Li (183 papers)
  4. Jian Sun (415 papers)
  5. Fei Huang (409 papers)
  6. Yongbin Li (128 papers)
Citations (18)

Summary

A Comprehensive Survey on Out-of-Distribution Detection in Natural Language Processing

Introduction to OOD Detection in NLP

Out-of-Distribution (OOD) detection has emerged as a pivotal aspect for ensuring the robustness and reliability of machine learning models, especially in NLP applications. Anomaly detection, a subset of OOD detection, identifies inputs that significantly diverge from the model's training distribution, thus posing potential challenges in real-world deployment of AI systems. This paper presents a systematic review of advancements in OOD detection specifically tailored to NLP, proposing a novel classification of OOD detection methods based on data availability and discussing datasets, applications, metrics, and future directions in the context of AI safety in NLP.

Methodological Classifications

OOD Data Available

Methods assuming access to both in-distribution (ID) and OOD data during model training are further divided into two:

  • Detection with Extensive OOD Data: Techniques in this category leverage labeled OOD data alongside ID data to refine model learning, catering to scenarios where extensive OOD samples are available for training.
  • Detection with Few OOD Data: These methods, acknowledging the impracticality of acquiring large-scale labeled OOD datasets, focus on generating pseudo-OOD samples from a limited set of real OOD instances.

OOD Data Unavailable + ID Label Available

In absence of OOD data, several strategies have been developed to exploit labeled ID data exclusively:

  • Learn Representations Then Detect: Approaches here aim at extracting discriminative features conducive to differentiating ID from OOD samples and subsequently scoring these samples for detection.
  • Generate Pseudo OOD Samples: This strategy revolves around simulating OOD samples using various data augmentation and generation techniques, effectively circumventing the absence of real OOD instances.
  • Other Approaches: This includes various innovative techniques that do not neatly fit into the two aforementioned categories.

OOD Data Unavailable + ID Label Unavailable

This setting is akin to unsupervised learning challenges, focusing on anomaly detection without labeled data. Techniques here primarily aim to learn robust representations that inherently segregate ID and OOD data.

Datasets and Applications

The paper categorizes OOD detection datasets based on their construction of OOD instances and discusses prevalent applications across NLP tasks, highlighting the diverse utility and necessity of OOD detection methodologies in enhancing AI safety and reliability in language-based models.

Evaluation Metrics

It presents an overview of standard metrics employed in assessing the performance of OOD detectors, such as AUROC, AUPR, FPR@N, among others, emphasizing their role in providing comprehensive evaluations of model efficacy in OOD detection.

Future Directions

The paper underscores areas of potential research, including the integration of OOD detection with domain generalization, leveraging extra information sources, and the amalgamation of OOD detection with lifelong learning frameworks. It also points out the need for theoretical explorations within the field of OOD detection.

Concluding Remarks

Through presenting a structured analysis of OOD detection methodologies tailored to NLP, this paper sheds light on the complexities and nuances inherent in ensuring AI systems' robustness against OOD inputs. By delineating current strategies, datasets, applications, and future directions, it contributes a foundational framework that supports ongoing and future research endeavors aimed at fortifying AI against the challenges posed by OOD instances.