Your Finetuned Large Language Model is Already a Powerful Out-of-distribution Detector (2404.08679v1)
Abstract: We revisit the likelihood ratio between a pretrained LLM and its finetuned variant as a criterion for out-of-distribution (OOD) detection. The intuition behind such a criterion is that, the pretrained LLM has the prior knowledge about OOD data due to its large amount of training data, and once finetuned with the in-distribution data, the LLM has sufficient knowledge to distinguish their difference. Leveraging the power of LLMs, we show that, for the first time, the likelihood ratio can serve as an effective OOD detector. Moreover, we apply the proposed LLM-based likelihood ratio to detect OOD questions in question-answering (QA) systems, which can be used to improve the performance of specialized LLMs for general questions. Given that likelihood can be easily obtained by the loss functions within contemporary neural network frameworks, it is straightforward to implement this approach in practice. Since both the pretrained LLMs and its various finetuned models are available, our proposed criterion can be effortlessly incorporated for OOD detection without the need for further training. We conduct comprehensive evaluation across on multiple settings, including far OOD, near OOD, spam detection, and QA scenarios, to demonstrate the effectiveness of the method.
- Contributions to the study of sms spam filtering: new collection and results. In Proceedings of the 11th ACM symposium on Document engineering, pp. 259–262, 2011.
- Christopher M Bishop. Novelty detection and neural network validation. IEE Proceedings-Vision, Image and Signal processing, 141(4):217–222, 1994.
- Piqa: Reasoning about physical commonsense in natural language. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pp. 7432–7439, 2020.
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
- Émile Borel. La mécanique statique et l’irréversibilité. J. Phys. Theor. Appl., 3(1):189–196, 1913.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Boolq: Exploring the surprising difficulty of natural yes/no questions. arXiv preprint arXiv:1905.10044, 2019.
- Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021.
- Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces. arXiv preprint arXiv:1805.10190, 2018.
- Arthur Eddington. The nature of the physical world: THE GIFFORD LECTURES 1927, volume 23. BoD–Books on Demand, 2019.
- Likelihood ratios and generative classifiers for unsupervised out-of-domain detection in task oriented dialog. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp. 7764–7771, 2020.
- Don’t stop pretraining: Adapt language models to domains and tasks. arXiv preprint arXiv:2004.10964, 2020.
- A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136, 2016.
- Measuring mathematical problem solving with the math dataset. arXiv preprint arXiv:2103.03874, 2021.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
- Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.
- Towards textual out-of-domain detection without in-domain labels. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30:1386–1395, 2022.
- Supervised contrastive learning. Advances in neural information processing systems, 33:18661–18673, 2020.
- Spam-t5: Benchmarking large language models for few-shot email spam detection. arXiv preprint arXiv:2304.01238, 2023.
- Ken Lang. Newsweeder: Learning to filter netnews. In Machine learning proceedings 1995, pp. 331–339. Elsevier, 1995.
- An evaluation dataset for intent classification and out-of-scope prediction. arXiv preprint arXiv:1909.02027, 2019.
- You need only uncertain answers: Data efficient multilingual question answering. ICML 2020 Workshop on Uncertainty and Robustness in Deep Learning, 2020.
- Spam filtering with naive bayes-which naive bayes? In CEAS, volume 17, pp. 28–69. Mountain View, CA, 2006.
- Do deep generative models know what they don’t know? arXiv preprint arXiv:1810.09136, 2018.
- Revisiting mahalanobis distance for transformer-based out-of-domain detection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp. 13675–13682, 2021.
- Know what you don’t know: Unanswerable questions for squad. arXiv preprint arXiv:1806.03822, 2018.
- Likelihood ratios for out-of-distribution detection. Advances in neural information processing systems, 32, 2019.
- A memory-based approach to anti-spam filtering for mailing lists. Information retrieval, 6:49–73, 2003.
- Understanding anomaly detection with deep invertible networks through hierarchies of distributions and features. Advances in Neural Information Processing Systems, 33:21038–21049, 2020.
- Input complexity and out-of-distribution detection with likelihood-based generative models. arXiv preprint arXiv:1909.11480, 2019.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- Is fine-tuning needed? pre-trained language models are near perfect for out-of-domain detection. arXiv preprint arXiv:2305.13282, 2023.
- Wat zei je? detecting out-of-distribution translations with variational transformers. arXiv preprint arXiv:2006.08344, 2020.
- Openood: Benchmarking generalized out-of-distribution detection. Advances in Neural Information Processing Systems, 35:32598–32611, 2022.
- Metamath: Bootstrap your own mathematical questions for large language models. arXiv preprint arXiv:2309.12284, 2023.
- Falsehoods that ml researchers believe about ood detection. arXiv preprint arXiv:2210.12767, 2022.
- On the out-of-distribution generalization of probabilistic image modelling. Advances in Neural Information Processing Systems, 34:3811–3823, 2021.
- Contrastive out-of-distribution detection for pretrained transformers. arXiv preprint arXiv:2104.08812, 2021.
- Andi Zhang (15 papers)
- Tim Z. Xiao (16 papers)
- Weiyang Liu (83 papers)
- Robert Bamler (33 papers)
- Damon Wischik (6 papers)