Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Probing Out-of-Distribution Robustness of Language Models with Parameter-Efficient Transfer Learning (2301.11660v4)

Published 27 Jan 2023 in cs.CL

Abstract: As the size of the pre-trained LLM (PLM) continues to increase, numerous parameter-efficient transfer learning methods have been proposed recently to compensate for the tremendous cost of fine-tuning. Despite the impressive results achieved by large pre-trained LLMs (PLMs) and various parameter-efficient transfer learning (PETL) methods on sundry benchmarks, it remains unclear if they can handle inputs that have been distributionally shifted effectively. In this study, we systematically explore how the ability to detect out-of-distribution (OOD) changes as the size of the PLM grows or the transfer methods are altered. Specifically, we evaluated various PETL techniques, including fine-tuning, Adapter, LoRA, and prefix-tuning, on three different intention classification tasks, each utilizing various LLMs with different scales.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. Layer normalization. arXiv preprint arXiv:1607.06450.
  2. GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow.
  3. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  4. Efficient intent detection with dual sentence encoders. In Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI.
  5. Enhancing out-of-distribution detection in natural language understanding via implicit layer ensemble. In Findings of the Association for Computational Linguistics: EMNLP.
  6. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
  7. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, NAACL.
  8. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. Journal of Machine Learning Research, 23(120):1–39.
  9. Geli Fei and Bing Liu. 2016. Breaking the closed world assumption in text classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics, NAACL.
  10. Making pre-trained language models better few-shot learners. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, ACL.
  11. Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11):665–673.
  12. Dan Hendrycks and Kevin Gimpel. 2017. A baseline for detecting misclassified and out-of-distribution examples in neural networks. In 5th International Conference on Learning Representations, ICLR.
  13. Scaling laws for autoregressive generative modeling. arXiv preprint arXiv:2010.14701.
  14. Training compute-optimal large language models. arXiv preprint arXiv:2203.15556.
  15. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning.
  16. Lora: Low-rank adaptation of large language models. In The Tenth International Conference on Learning Representations, ICLR.
  17. How can we know what language models know. Transactions of the Association for Computational Linguistics.
  18. Self-guided contrastive learning for BERT sentence representations. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics ACL.
  19. An evaluation dataset for intent classification and out-of-scope prediction. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing EMNLP.
  20. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In Advances in Neural Information Processing Systems, pages 7167–7177.
  21. The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP.
  22. Cross-domain sentiment classification with contrastive learning and mutual information maximization. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 8203–8207. IEEE.
  23. Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, ACL.
  24. Ting-En Lin and Hua Xu. 2019. Deep unknown intent detection with margin loss. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL.
  25. Energy-based out-of-distribution detection. Advances in Neural Information Processing Systems, 33:21464–21475.
  26. Gpt understands, too. arXiv preprint arXiv:2103.10385.
  27. Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization. In 7th International Conference on Learning Representations, ICLR.
  28. MASKER: masked keyword regularization for reliable text classification. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021.
  29. Language models as knowledge bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, EMNLP.
  30. Adapterfusion: Non-destructive task composition for transfer learning. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, EACL.
  31. Revisiting mahalanobis distance for transformer-based out-of-domain detection. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021.
  32. Synchromesh: Reliable code generation from pre-trained language models. In The Tenth International Conference on Learning Representations, ICLR.
  33. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
  34. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research.
  35. Zero: Memory optimizations toward training trillion parameter models. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–16.
  36. Squad: 100, 000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP.
  37. Timo Schick and Hinrich Schütze. 2021. Exploiting cloze-questions for few-shot text classification and natural language inference. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, EACL.
  38. Enhancing the generalization for intent classification and out-of-domain detection in SLU. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics ACL.
  39. Autoprompt: Eliciting knowledge from language models with automatically generated prompts. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP.
  40. DOC: deep open classification of text documents. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP.
  41. CSI: novelty detection via contrastive learning on distributionally shifted instances. In Advances in Neural Information Processing Systems 33, NeurIPS 2020.
  42. Attention is all you need. In Advances in neural information processing systems.
  43. Superglue: A stickier benchmark for general-purpose language understanding systems. Advances in neural information processing systems.
  44. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In 7th International Conference on Learning Representations, ICLR.
  45. Ben Wang and Aran Komatsuzaki. 2021. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax.
  46. Bin Wang and C-C Jay Kuo. 2020. Sbert-wk: A sentence embedding method by dissecting bert-based word models. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28:2146–2157.
  47. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
  48. Modeling discriminative representations for out-of-domain detection with supervised contrastive learning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, ACL.
  49. Are pre-trained transformers robust in intent classification? a missing ingredient in evaluation of out-of-scope intent detection. In Proceedings of the 4th Workshop on NLP for Conversational AI.
  50. Calibrate before use: Improving few-shot performance of language models. In Proceedings of the 38th International Conference on Machine Learning, ICßML.
  51. Contrastive out-of-distribution detection for pretrained transformers. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP.
Citations (3)

Summary

We haven't generated a summary for this paper yet.