Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Open-World Lottery Ticket Hypothesis for OOD Intent Classification (2210.07071v3)

Published 13 Oct 2022 in cs.CL

Abstract: Most existing methods of Out-of-Domain (OOD) intent classification rely on extensive auxiliary OOD corpora or specific training paradigms. However, they are underdeveloped in the underlying principle that the models should have differentiated confidence in In- and Out-of-domain intent. In this work, we shed light on the fundamental cause of model overconfidence on OOD and demonstrate that calibrated subnetworks can be uncovered by pruning the overparameterized model. Calibrated confidence provided by the subnetwork can better distinguish In- and Out-of-domain, which can be a benefit for almost all post hoc methods. In addition to bringing fundamental insights, we also extend the Lottery Ticket Hypothesis to open-world scenarios. We conduct extensive experiments on four real-world datasets to demonstrate our approach can establish consistent improvements compared with a suite of competitive baselines.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Galen Andrew and Jianfeng Gao. 2007. Scalable training of L1subscript𝐿1L_{1}italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-regularized log-linear models. In Proceedings of the 24th International Conference on Machine Learning, pages 33–40.
  2. Amos Azaria and Tom Mitchell. 2023. The internal state of an LLM knows when it’s lying. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 967–976, Singapore. Association for Computational Linguistics.
  3. BSI. 1973a. Natural Fibre Twines, 3rd edition. British Standards Institution, London. BS 2570.
  4. BSI. 1973b. Natural fibre twines. BS 2570, British Standards Institution, London. 3rd. edn.
  5. Low-complexity probing via finding subnetworks. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 960–966, Online. Association for Computational Linguistics.
  6. Low-complexity probing via finding subnetworks. CoRR, abs/2104.03514.
  7. Efficient intent detection with dual sentence encoders. arXiv preprint arXiv:2003.04807.
  8. The use of user modelling to guide inference and learning. Applied Intelligence, 2(1):37–53.
  9. Evan Chen. 2014. A brief introduction to olympiad inequalities.
  10. Can ai assistants know what they don’t know? ArXiv, abs/2401.13275.
  11. J.L. Chercheur. 1994. Case-Based Reasoning, 2nd edition. Morgan Kaufman Publishers, San Mateo, CA.
  12. N. Chomsky. 1973. Conditions on transformations. In A festschrift for Morris Halle, New York. Holt, Rinehart & Winston.
  13. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pages 4171–4186. Association for Computational Linguistics.
  14. Umberto Eco. 1990. The Limits of Interpretation. Indian University Press.
  15. Jonathan Frankle and Michael Carbin. 2019. The lottery ticket hypothesis: Finding sparse, trainable neural networks. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net.
  16. On calibration of modern neural networks. CoRR, abs/1706.04599.
  17. Scaling out-of-distribution detection for real-world settings. arXiv preprint arXiv:1911.11132.
  18. Dan Hendrycks and Kevin Gimpel. 2017. A baseline for detecting misclassified and out-of-distribution examples in neural networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net.
  19. Pretrained transformers improve out-of-distribution robustness. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2744–2751, Online. Association for Computational Linguistics.
  20. Paul Gerhard Hoel. 1971a. Elementary Statistics, 3rd edition. Wiley series in probability and mathematical statistics. Wiley, New York, Chichester. ISBN 0 471 40300.
  21. Paul Gerhard Hoel. 1971b. Elementary Statistics, 3rd edition, Wiley series in probability and mathematical statistics, pages 19–33. Wiley, New York, Chichester. ISBN 0 471 40300.
  22. Otto Jespersen. 1922. Language: Its Nature, Development, and Origin. Allen and Unwin.
  23. Language models (mostly) know what they know. CoRR, abs/2207.05221.
  24. An evaluation dataset for intent classification and out-of-scope prediction. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pages 1311–1316. Association for Computational Linguistics.
  25. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. CoRR, abs/1807.03888.
  26. Enhancing the reliability of out-of-distribution image detection in neural networks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net.
  27. Energy-based out-of-distribution detection. CoRR, abs/2010.03759.
  28. Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692.
  29. Learning sparse neural networks through l00{}_{\mbox{0}}start_FLOATSUBSCRIPT 0 end_FLOATSUBSCRIPT regularization. CoRR, abs/1712.01312.
  30. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, pages 427–436. IEEE Computer Society.
  31. A history of technology. Oxford University Press, London. 5 vol.
  32. Jannik Strötgen and Michael Gertz. 2012. Temporal tagging on different domains: Challenges, strategies, and gold standards. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), pages 3746–3753, Istanbul, Turkey. European Language Resource Association (ELRA).
  33. React: Out-of-distribution detection with rectified activations. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 144–157.
  34. Superheroes experiences with books, 20th edition. The Phantom Editors Associates, Gotham City.
  35. Llama 2: Open foundation and fine-tuned chat models. ArXiv, abs/2307.09288.
  36. Short text clustering via convolutional neural networks. In Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, pages 62–69.
  37. Do large language models know what they don’t know? In Findings of the Association for Computational Linguistics: ACL 2023, pages 8653–8665, Toronto, Canada. Association for Computational Linguistics.
  38. Modeling discriminative representations for out-of-domain detection with supervised contrastive learning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 2: Short Papers), Virtual Event, August 1-6, 2021, pages 870–878. Association for Computational Linguistics.
  39. Out-of-scope intent detection with self-supervision and discriminative training. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021, pages 3521–3532. Association for Computational Linguistics.
  40. Can subnetwork structure be the key to out-of-distribution generalization? In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 12356–12367. PMLR.
  41. Discovering new intents with deep aligned clustering. Proceedings of the AAAI Conference on Artificial Intelligence, 35(16):14365–14373.
  42. Robust lottery tickets for pre-trained language models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2211–2224, Dublin, Ireland. Association for Computational Linguistics.
  43. Out-of-domain detection for natural language understanding in dialog systems. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28:1198–1209.
  44. Towards open environment intent prediction. In Findings of the Association for Computational Linguistics: ACL 2023, pages 2226–2240, Toronto, Canada. Association for Computational Linguistics.
  45. KNN-contrastive learning for out-of-domain intent classification. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5129–5141, Dublin, Ireland. Association for Computational Linguistics.
  46. A probabilistic framework for discovering new intents. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3771–3784, Toronto, Canada. Association for Computational Linguistics.
  47. Two birds one stone: Dynamic ensemble for OOD intent classification. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 10659–10673, Toronto, Canada. Association for Computational Linguistics.
Citations (1)

Summary

We haven't generated a summary for this paper yet.