Building Efficient Universal Classifiers with Natural Language Inference (2312.17543v2)
Abstract: Generative LLMs have become the mainstream choice for fewshot and zeroshot learning thanks to the universality of text generation. Many users, however, do not need the broad capabilities of generative LLMs when they only want to automate a classification task. Smaller BERT-like models can also learn universal tasks, which allow them to do any text classification task without requiring fine-tuning (zeroshot classification) or to learn new tasks with only a few examples (fewshot), while being significantly more efficient than generative LLMs. This paper (1) explains how Natural Language Inference (NLI) can be used as a universal classification task that follows similar principles as instruction fine-tuning of generative LLMs, (2) provides a step-by-step guide with reusable Jupyter notebooks for building a universal classifier, and (3) shares the resulting universal classifier that is trained on 33 datasets with 389 diverse classes. Parts of the code we share has been used to train our older zeroshot classifiers that have been downloaded more than 55 million times via the Hugging Face Hub as of December 2023. Our new classifier improves zeroshot performance by 9.4%.
- GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints. ArXiv:2305.13245 [cs].
- A large annotated corpus for learning natural language inference. arXiv:1508.05326 [cs]. ArXiv: 1508.05326.
- FLEX: Unifying Evaluation for Few-Shot NLP. arXiv:2107.07170 [cs]. ArXiv: 2107.07170.
- API design for machine learning software: experiences from the scikit-learn project. ArXiv:1309.0238 [cs].
- Scaling Instruction-Finetuned Language Models. ArXiv:2210.11416 [cs].
- ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. arXiv:2003.10555 [cs]. ArXiv: 2003.10555.
- XNLI: Evaluating Cross-lingual Sentence Representations. arXiv:1809.05053 [cs]. ArXiv: 1809.05053.
- The PASCAL Recognising Textual Entailment Challenge. In Joaquin Quiñonero-Candela, Ido Dagan, Bernardo Magnini, and Florence d’Alché Buc, editors, Machine Learning Challenges. Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Tectual Entailment, volume 3944, pages 177–190. Springer Berlin Heidelberg, Berlin, Heidelberg. Series Title: Lecture Notes in Computer Science.
- FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness. ArXiv:2205.14135 [cs].
- DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing. arXiv:2111.09543 [cs]. ArXiv: 2111.09543.
- Training Compute-Optimal Large Language Models. ArXiv:2203.15556 [cs].
- Less Annotating, More Classifying: Addressing the Data Scarcity Issue of Supervised Machine Learning with Deep Transfer Learning and BERT-NLI. Political Analysis, pages 1–33.
- Lowering the Language Barrier: Investigating Deep Transfer Learning and Machine Translation for Multilingual Analyses of Political Texts. Computational Communication Research, 5(2):1.
- Teven Le Scao and Alexander Rush. 2021. How many data points is a prompt worth? In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2627–2636, Online. Association for Computational Linguistics.
- WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 6826–6847, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- The Flan Collection: Designing Data and Methods for Effective Instruction Tuning. ArXiv:2301.13688 [cs].
- The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI. ArXiv:2310.16787 [cs].
- Issues with Entailment-based Zero-shot Text Classification. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 786–796, Online. Association for Computational Linguistics.
- Adversarial NLI: A New Benchmark for Natural Language Understanding. arXiv:1910.14599 [cs]. ArXiv: 1910.14599.
- OpenAI. 2023. GPT-4 Technical Report. ArXiv:2303.08774 [cs].
- Training language models to follow instructions with human feedback. ArXiv:2203.02155 [cs].
- Does Putting a Linguist in the Loop Improve NLU Data Collection? arXiv:2104.07179 [cs]. ArXiv: 2104.07179.
- Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation. ArXiv:2108.12409 [cs].
- Language Models are Unsupervised Multitask Learners.
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. ArXiv:1910.10683 [cs, stat].
- Model-tuning Via Prompts Makes NLP Models Adversarially Robust. ArXiv:2303.07320 [cs].
- Multitask Prompted Training Enables Zero-Shot Task Generalization. ArXiv:2110.08207 [cs].
- Timo Schick and Hinrich Schütze. 2021a. Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 255–269, Online. Association for Computational Linguistics.
- Timo Schick and Hinrich Schütze. 2021b. It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners. arXiv:2009.07118 [cs]. ArXiv: 2009.07118.
- Damien Sileo. 2023. tasksource: A Dataset Harmonization Framework for Streamlined NLP Multi-Task Learning and Evaluation. ArXiv:2301.05948 [cs].
- RoFormer: Enhanced Transformer with Rotary Position Embedding. ArXiv:2104.09864 [cs].
- NSP-BERT: A Prompt-based Few-Shot Learner Through an Original Pre-training Task–Next Sentence Prediction. ArXiv:2109.03564 [cs].
- Alpaca: A Strong, Replicable Instruction-Following Model.
- Llama 2: Open Foundation and Fine-Tuned Chat Models. ArXiv:2307.09288 [cs].
- Zephyr: Direct Distillation of LM Alignment. ArXiv:2310.16944 [cs].
- Entailment as Few-Shot Learner. arXiv:2104.14690 [cs]. ArXiv: 2104.14690.
- A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112–1122, New Orleans, Louisiana. Association for Computational Linguistics.
- Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
- Prompting ELECTRA: Few-Shot Learning with Discriminative Pre-Trained Models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11351–11361, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- A Universal Discriminator for Zero-Shot Generalization. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 10559–10575, Toronto, Canada. Association for Computational Linguistics.
- Prompt Tuning for Discriminative Pre-trained Language Models. In Findings of the Association for Computational Linguistics: ACL 2022, pages 3468–3473, Dublin, Ireland. Association for Computational Linguistics.
- Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3914–3923, Hong Kong, China. Association for Computational Linguistics.
- DocNLI: A large-scale dataset for document-level natural language inference. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 4913–4922, Online. Association for Computational Linguistics.
- Universal Natural Language Processing with Limited Annotations: Try Few-shot Textual Entailment as a Start. arXiv:2010.02584 [cs]. ArXiv: 2010.02584.
- LIMA: Less Is More for Alignment. ArXiv:2305.11206 [cs].
- Moritz Laurer (1 paper)
- Wouter van Atteveldt (2 papers)
- Andreu Casas (1 paper)
- Kasper Welbers (4 papers)