Building Efficient Universal Classifiers with Natural Language Inference (2312.17543v2)

Published 29 Dec 2023 in cs.CL and cs.AI

Abstract: Generative LLMs have become the mainstream choice for fewshot and zeroshot learning thanks to the universality of text generation. Many users, however, do not need the broad capabilities of generative LLMs when they only want to automate a classification task. Smaller BERT-like models can also learn universal tasks, which allow them to do any text classification task without requiring fine-tuning (zeroshot classification) or to learn new tasks with only a few examples (fewshot), while being significantly more efficient than generative LLMs. This paper (1) explains how Natural Language Inference (NLI) can be used as a universal classification task that follows similar principles as instruction fine-tuning of generative LLMs, (2) provides a step-by-step guide with reusable Jupyter notebooks for building a universal classifier, and (3) shares the resulting universal classifier that is trained on 33 datasets with 389 diverse classes. Parts of the code we share has been used to train our older zeroshot classifiers that have been downloaded more than 55 million times via the Hugging Face Hub as of December 2023. Our new classifier improves zeroshot performance by 9.4%.

References (45)

Authors (4)

Moritz Laurer (1 paper)
Wouter van Atteveldt (2 papers)
Andreu Casas (1 paper)
Kasper Welbers (4 papers)

Citations (7)

View on Semantic Scholar

Summary

Building Efficient Universal Classifiers with Natural Language Inference

Introduction to Universal Classifiers and NLI

The expansion of generative LLMs has ushered in new methodologies for task automation with an emphasis on versatility and efficiency. Given the substantial resources required for operating such models, there is a critical examination of alternative mechanisms that strike a balance between universality and resource economy. This discourse propels the exploration of Natural Language Inference (NLI) as a foundational task for universal classification, which, while less resource-intensive than generative models, promises competitive performance in text classification tasks. The paper elucidates the utilization of NLI for universal classification, delineating a practitioner's guide for constructing such classifiers and further sharing an open-source universal classifier pre-trained on a broad dataset ensemble.

A Closer Look at NLI for Classification

NLI's premise is simple yet powerful—determining whether a 'hypothesis' is true (entailed) or false (not entailed) based on a given 'premise.' This binary judgment forms the crux of universal classification, allowing almost any classification task to be reframed as an entailment challenge. Through the strategic verbalization of class labels into hypotheses, NLI models can be leveraged for a myriad of classification tasks without specific fine-tuning for each. Despite its computational efficiency, a conscious trade-off emerges with NLI's need for individual predictions per class, underscoring a potential drawback for tasks involving numerous classes.

Methodology for Building Efficient Classifiers

The creation of an efficient universal classifier using NLI spans several phases, from dataset selection and harmonization, incorporating both NLI and various non-NLI datasets, to model training and evaluation. A notable innovation shared in the paper is the highly efficient approach to hypothesis formulation, effectively converting non-NLI datasets into the NLI format. This transformation is pivotal, ensuring that classification tasks, regardless of their original format, can be approached from an NLI perspective. Subsequently, detailed processes for data cleaning and preprocessing underscore the importance of dataset quality and diversity in training robust models.

Performance Insights and Implications

Empirical evaluations reveal a significant enhancement in zero-shot performance stemming from the inclusion of a wide range of datasets, marking a 9.4% improvement over models trained on NLI data alone. Furthermore, the methodology demonstrates not just an ability to excel in seen classification tasks but also a noteworthy generalizability to previously unseen tasks. This holistic improvement underscores the potential of NLI-driven universal classifiers not only as a resource-efficient alternative to generative models but also as a robust solution to a broad spectrum of classification tasks.

Practical Applications and Future Prospects

The utility of the described universal classifier is manifold, extending from direct application via Hugging Face’s ZeroShotClassificationPipeline to serving as a base model for further fine-tuning on specific tasks. Importantly, the guide provides a pathway for researchers and practitioners to tailor universal classifiers to their domain-specific needs by integrating additional datasets. Looking forward, the paper prompts a reconsideration of pre-training objectives for classification tasks, suggesting a possible shift towards more self-supervised, universal targets that could enhance both efficiency and generalization of future models.

Final Thoughts

In conclusion, this paper not only presents a pragmatic approach to leveraging NLI for building universal classifiers but also sets the stage for future advancements in efficient, model-based classification. By sharing comprehensive guides, code, and pre-trained models, it empowers the AI research community to explore, extend, and enhance the capabilities of NLI-based classifiers. As we progress, the aspiration for more refined, efficient, and universally applicable models remains a North Star, guiding ongoing pursuits within the field of AI research.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - MoritzLaurer/zeroshot-classifier: Notebooks for training universal 0-shot classifiers on many different tasks (130 stars)

Tweets

https://twitter.com/22146921/status/1741940657579794517

HackerNews

Building an efficient zero-shot classifier (1 point, 0 comments)