Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Building Efficient Universal Classifiers with Natural Language Inference (2312.17543v2)

Published 29 Dec 2023 in cs.CL and cs.AI

Abstract: Generative LLMs have become the mainstream choice for fewshot and zeroshot learning thanks to the universality of text generation. Many users, however, do not need the broad capabilities of generative LLMs when they only want to automate a classification task. Smaller BERT-like models can also learn universal tasks, which allow them to do any text classification task without requiring fine-tuning (zeroshot classification) or to learn new tasks with only a few examples (fewshot), while being significantly more efficient than generative LLMs. This paper (1) explains how Natural Language Inference (NLI) can be used as a universal classification task that follows similar principles as instruction fine-tuning of generative LLMs, (2) provides a step-by-step guide with reusable Jupyter notebooks for building a universal classifier, and (3) shares the resulting universal classifier that is trained on 33 datasets with 389 diverse classes. Parts of the code we share has been used to train our older zeroshot classifiers that have been downloaded more than 55 million times via the Hugging Face Hub as of December 2023. Our new classifier improves zeroshot performance by 9.4%.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints. ArXiv:2305.13245 [cs].
  2. A large annotated corpus for learning natural language inference. arXiv:1508.05326 [cs]. ArXiv: 1508.05326.
  3. FLEX: Unifying Evaluation for Few-Shot NLP. arXiv:2107.07170 [cs]. ArXiv: 2107.07170.
  4. API design for machine learning software: experiences from the scikit-learn project. ArXiv:1309.0238 [cs].
  5. Scaling Instruction-Finetuned Language Models. ArXiv:2210.11416 [cs].
  6. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. arXiv:2003.10555 [cs]. ArXiv: 2003.10555.
  7. XNLI: Evaluating Cross-lingual Sentence Representations. arXiv:1809.05053 [cs]. ArXiv: 1809.05053.
  8. The PASCAL Recognising Textual Entailment Challenge. In Joaquin Quiñonero-Candela, Ido Dagan, Bernardo Magnini, and Florence d’Alché Buc, editors, Machine Learning Challenges. Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Tectual Entailment, volume 3944, pages 177–190. Springer Berlin Heidelberg, Berlin, Heidelberg. Series Title: Lecture Notes in Computer Science.
  9. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness. ArXiv:2205.14135 [cs].
  10. DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing. arXiv:2111.09543 [cs]. ArXiv: 2111.09543.
  11. Training Compute-Optimal Large Language Models. ArXiv:2203.15556 [cs].
  12. Less Annotating, More Classifying: Addressing the Data Scarcity Issue of Supervised Machine Learning with Deep Transfer Learning and BERT-NLI. Political Analysis, pages 1–33.
  13. Lowering the Language Barrier: Investigating Deep Transfer Learning and Machine Translation for Multilingual Analyses of Political Texts. Computational Communication Research, 5(2):1.
  14. Teven Le Scao and Alexander Rush. 2021. How many data points is a prompt worth? In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2627–2636, Online. Association for Computational Linguistics.
  15. WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 6826–6847, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  16. The Flan Collection: Designing Data and Methods for Effective Instruction Tuning. ArXiv:2301.13688 [cs].
  17. The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI. ArXiv:2310.16787 [cs].
  18. Issues with Entailment-based Zero-shot Text Classification. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 786–796, Online. Association for Computational Linguistics.
  19. Adversarial NLI: A New Benchmark for Natural Language Understanding. arXiv:1910.14599 [cs]. ArXiv: 1910.14599.
  20. OpenAI. 2023. GPT-4 Technical Report. ArXiv:2303.08774 [cs].
  21. Training language models to follow instructions with human feedback. ArXiv:2203.02155 [cs].
  22. Does Putting a Linguist in the Loop Improve NLU Data Collection? arXiv:2104.07179 [cs]. ArXiv: 2104.07179.
  23. Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation. ArXiv:2108.12409 [cs].
  24. Language Models are Unsupervised Multitask Learners.
  25. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. ArXiv:1910.10683 [cs, stat].
  26. Model-tuning Via Prompts Makes NLP Models Adversarially Robust. ArXiv:2303.07320 [cs].
  27. Multitask Prompted Training Enables Zero-Shot Task Generalization. ArXiv:2110.08207 [cs].
  28. Timo Schick and Hinrich Schütze. 2021a. Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 255–269, Online. Association for Computational Linguistics.
  29. Timo Schick and Hinrich Schütze. 2021b. It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners. arXiv:2009.07118 [cs]. ArXiv: 2009.07118.
  30. Damien Sileo. 2023. tasksource: A Dataset Harmonization Framework for Streamlined NLP Multi-Task Learning and Evaluation. ArXiv:2301.05948 [cs].
  31. RoFormer: Enhanced Transformer with Rotary Position Embedding. ArXiv:2104.09864 [cs].
  32. NSP-BERT: A Prompt-based Few-Shot Learner Through an Original Pre-training Task–Next Sentence Prediction. ArXiv:2109.03564 [cs].
  33. Alpaca: A Strong, Replicable Instruction-Following Model.
  34. Llama 2: Open Foundation and Fine-Tuned Chat Models. ArXiv:2307.09288 [cs].
  35. Zephyr: Direct Distillation of LM Alignment. ArXiv:2310.16944 [cs].
  36. Entailment as Few-Shot Learner. arXiv:2104.14690 [cs]. ArXiv: 2104.14690.
  37. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112–1122, New Orleans, Louisiana. Association for Computational Linguistics.
  38. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
  39. Prompting ELECTRA: Few-Shot Learning with Discriminative Pre-Trained Models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11351–11361, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  40. A Universal Discriminator for Zero-Shot Generalization. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 10559–10575, Toronto, Canada. Association for Computational Linguistics.
  41. Prompt Tuning for Discriminative Pre-trained Language Models. In Findings of the Association for Computational Linguistics: ACL 2022, pages 3468–3473, Dublin, Ireland. Association for Computational Linguistics.
  42. Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3914–3923, Hong Kong, China. Association for Computational Linguistics.
  43. DocNLI: A large-scale dataset for document-level natural language inference. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 4913–4922, Online. Association for Computational Linguistics.
  44. Universal Natural Language Processing with Limited Annotations: Try Few-shot Textual Entailment as a Start. arXiv:2010.02584 [cs]. ArXiv: 2010.02584.
  45. LIMA: Less Is More for Alignment. ArXiv:2305.11206 [cs].
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Moritz Laurer (1 paper)
  2. Wouter van Atteveldt (2 papers)
  3. Andreu Casas (1 paper)
  4. Kasper Welbers (4 papers)
Citations (7)

Summary

Building Efficient Universal Classifiers with Natural Language Inference

Introduction to Universal Classifiers and NLI

The expansion of generative LLMs has ushered in new methodologies for task automation with an emphasis on versatility and efficiency. Given the substantial resources required for operating such models, there is a critical examination of alternative mechanisms that strike a balance between universality and resource economy. This discourse propels the exploration of Natural Language Inference (NLI) as a foundational task for universal classification, which, while less resource-intensive than generative models, promises competitive performance in text classification tasks. The paper elucidates the utilization of NLI for universal classification, delineating a practitioner's guide for constructing such classifiers and further sharing an open-source universal classifier pre-trained on a broad dataset ensemble.

A Closer Look at NLI for Classification

NLI's premise is simple yet powerful—determining whether a 'hypothesis' is true (entailed) or false (not entailed) based on a given 'premise.' This binary judgment forms the crux of universal classification, allowing almost any classification task to be reframed as an entailment challenge. Through the strategic verbalization of class labels into hypotheses, NLI models can be leveraged for a myriad of classification tasks without specific fine-tuning for each. Despite its computational efficiency, a conscious trade-off emerges with NLI's need for individual predictions per class, underscoring a potential drawback for tasks involving numerous classes.

Methodology for Building Efficient Classifiers

The creation of an efficient universal classifier using NLI spans several phases, from dataset selection and harmonization, incorporating both NLI and various non-NLI datasets, to model training and evaluation. A notable innovation shared in the paper is the highly efficient approach to hypothesis formulation, effectively converting non-NLI datasets into the NLI format. This transformation is pivotal, ensuring that classification tasks, regardless of their original format, can be approached from an NLI perspective. Subsequently, detailed processes for data cleaning and preprocessing underscore the importance of dataset quality and diversity in training robust models.

Performance Insights and Implications

Empirical evaluations reveal a significant enhancement in zero-shot performance stemming from the inclusion of a wide range of datasets, marking a 9.4% improvement over models trained on NLI data alone. Furthermore, the methodology demonstrates not just an ability to excel in seen classification tasks but also a noteworthy generalizability to previously unseen tasks. This holistic improvement underscores the potential of NLI-driven universal classifiers not only as a resource-efficient alternative to generative models but also as a robust solution to a broad spectrum of classification tasks.

Practical Applications and Future Prospects

The utility of the described universal classifier is manifold, extending from direct application via Hugging Face’s ZeroShotClassificationPipeline to serving as a base model for further fine-tuning on specific tasks. Importantly, the guide provides a pathway for researchers and practitioners to tailor universal classifiers to their domain-specific needs by integrating additional datasets. Looking forward, the paper prompts a reconsideration of pre-training objectives for classification tasks, suggesting a possible shift towards more self-supervised, universal targets that could enhance both efficiency and generalization of future models.

Final Thoughts

In conclusion, this paper not only presents a pragmatic approach to leveraging NLI for building universal classifiers but also sets the stage for future advancements in efficient, model-based classification. By sharing comprehensive guides, code, and pre-trained models, it empowers the AI research community to explore, extend, and enhance the capabilities of NLI-based classifiers. As we progress, the aspiration for more refined, efficient, and universally applicable models remains a North Star, guiding ongoing pursuits within the field of AI research.

X Twitter Logo Streamline Icon: https://streamlinehq.com

HackerNews