Papers
Topics
Authors
Recent
2000 character limit reached

Small Language Models are Good Too: An Empirical Study of Zero-Shot Classification (2404.11122v1)

Published 17 Apr 2024 in cs.AI

Abstract: This study is part of the debate on the efficiency of large versus small LLMs for text classification by prompting.We assess the performance of small LLMs in zero-shot text classification, challenging the prevailing dominance of large models.Across 15 datasets, our investigation benchmarks LLMs from 77M to 40B parameters using different architectures and scoring functions. Our findings reveal that small models can effectively classify texts, getting on par with or surpassing their larger counterparts.We developed and shared a comprehensive open-source repository that encapsulates our methodologies. This research underscores the notion that bigger isn't always better, suggesting that resource-efficient small models may offer viable solutions for specific data classification challenges.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling. Technical report. ArXiv:2304.01373 [cs] type: article.
  2. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
  3. PaLM: Scaling Language Modeling with Pathways. Technical report. ArXiv:2204.02311 [cs] type: article.
  4. Ronen Eldan and Yuanzhi Li. 2023. Tinystories: How small can language models be and still speak coherent english?
  5. Beyond prompting: Making Pre-trained Language Models Better Zero-shot Learners by Clustering Representations. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 8560–8579, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  6. Training Compute-Optimal Large Language Models. Technical report. ArXiv:2203.15556 [cs] type: article.
  7. Surface Form Competition: Why the Highest Probability Answer Isn’t Always Right. Technical report. ArXiv:2104.08315 [cs] type: article.
  8. Scaling Laws for Neural Language Models. Technical report. ArXiv:2001.08361 [cs, stat] type: article.
  9. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880, Online. Association for Computational Linguistics.
  10. Reconstructing Capsule Networks for Zero-shot Intent Classification. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4799–4809, Hong Kong, China. Association for Computational Linguistics.
  11. The flan collection: Designing data and methods for effective instruction tuning.
  12. What makes pre-trained language models better zero-shot learners? In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2288–2303, Toronto, Canada. Association for Computational Linguistics.
  13. Text Classification Using Label Names Only: A Language Model Self-Training Approach. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 9006–9017, Online. Association for Computational Linguistics.
  14. Crosslingual Generalization through Multitask Finetuning. Technical report. ArXiv:2211.01786 [cs] type: article.
  15. Costs to Consider in Adopting NLP for Your Business. Technical report. ArXiv:2012.08958 [cs] type: article.
  16. OpenAI. 2023. GPT-4 Technical Report. Technical report. ArXiv:2303.08774 [cs] type: article.
  17. The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only. Technical report. ArXiv:2306.01116 [cs] type: article.
  18. Rwkv: Reinventing rnns for the transformer era.
  19. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Technical report. ArXiv:1910.10683 [cs, stat] type: article.
  20. BLOOM: A 176B-Parameter Open-Access Multilingual Language Model. Working paper or preprint.
  21. Language Models in the Loop: Incorporating Prompting into Weak Supervision. Technical report. ArXiv:2205.02318 [cs] type: article.
  22. Retentive network: A successor to transformer for large language models.
  23. Llama: Open and efficient foundation language models.
  24. Lamini-lm: A diverse herd of distilled models from large-scale instructions. CoRR, abs/2304.14402.
  25. Zero-shot User Intent Detection via Capsule Neural Networks. Technical report. ArXiv:1809.00385 [cs] type: article.
  26. Instruction Tuning for Large Language Models: A Survey. Technical report. ArXiv:2308.10792 [cs] type: article.
  27. Pre-trained language models can be fully zero-shot learners. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15590–15606, Toronto, Canada. Association for Computational Linguistics.
  28. TubeSpam: Comment Spam Filtering on YouTube. 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), pages 138–143.
  29. Tiago Almeida and Jos Hidalgo. 2012. SMS Spam Collection. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5CC84.
  30. SemEval-2010 task 8: Multi-way classification of semantic relations between pairs of nominals. In Proceedings of the 5th International Workshop on Semantic Evaluation, pages 33–38, Uppsala, Sweden. Association for Computational Linguistics.
  31. Overview of the BioCreative VI chemical-protein interaction Track.
  32. ETHOS: an Online Hate Speech Detection Dataset. Complex & Intelligent Systems, 8(6):4663–4678. ArXiv:2006.08328 [cs, stat].
  33. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631–1642, Seattle, Washington, USA. Association for Computational Linguistics.
  34. WRENCH: A Comprehensive Benchmark for Weak Supervision. ArXiv.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.