Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Generation-driven Contrastive Self-training for Zero-shot Text Classification with Instruction-following LLM (2304.11872v2)

Published 24 Apr 2023 in cs.CL and cs.AI

Abstract: The remarkable performance of LLMs in zero-shot language understanding has garnered significant attention. However, employing LLMs for large-scale inference or domain-specific fine-tuning requires immense computational resources due to their substantial model size. To overcome these limitations, we introduce a novel method, namely GenCo, which leverages the strong generative power of LLMs to assist in training a smaller and more adaptable LLM. In our method, an LLM plays an important role in the self-training loop of a smaller model in two important ways. Firstly, the LLM is used to augment each input instance with a variety of possible continuations, enriching its semantic context for better understanding. Secondly, it helps crafting additional high-quality training pairs, by rewriting input texts conditioned on predicted labels. This ensures the generated texts are highly relevant to the predicted labels, alleviating the prediction error during pseudo-labeling, while reducing the dependency on large volumes of unlabeled text. In our experiments, GenCo outperforms previous state-of-the-art methods when only limited ($<5\%$ of original) in-domain text data is available. Notably, our approach surpasses the performance of Alpaca-7B with human prompts, highlighting the potential of leveraging LLM for self-training.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Language models are few-shot learners. arXiv preprint arXiv:2005.14165.
  2. Olivier Chapelle and Alexander Zien. 2005. Semi-supervised classification by low density separation. In International workshop on artificial intelligence and statistics, pages 57–64. PMLR.
  3. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
  4. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality. See https://vicuna. lmsys. org (accessed 14 April 2023).
  5. Celda: Leveraging black-box language model as enhanced classifier without labels. arXiv preprint arXiv:2306.02693.
  6. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  7. Beyond prompting: Making pre-trained language models better zero-shot learners by clustering representations. arXiv preprint arXiv:2210.16637.
  8. Zerogen+++: Self-guided high-quality data generation in efficient zero-shot learning. arXiv preprint arXiv:2205.12679.
  9. Simcse: Simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821.
  10. Zero-shot text classification with self-training. arXiv preprint arXiv:2210.17541.
  11. Yves Grandvalet and Yoshua Bengio. 2004. Semi-supervised learning by entropy minimization. Advances in neural information processing systems, 17.
  12. Yves Grandvalet and Yoshua Bengio. 2006. Entropy regularization.
  13. Tess: Zero-shot classification via textual similarity comparison with prompting using sentence encoder. arXiv preprint arXiv:2212.10391.
  14. Unnatural instructions: Tuning language models with (almost) no human labor. arXiv preprint arXiv:2212.09689.
  15. Supervised contrastive learning. In Advances in Neural Information Processing Systems, pages 18661–18673. Curran Associates, Inc.
  16. Dong-Hyun Lee et al. 2013. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Workshop on challenges in representation learning, ICML, volume 3, page 896.
  17. Generating training data with language models: Towards zero-shot language understanding. arXiv preprint arXiv:2202.04538.
  18. Text classification using label names only: A language model self-training approach. arXiv preprint arXiv:2010.07245.
  19. When does label smoothing help? Advances in neural information processing systems, 32.
  20. Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155.
  21. Data augmentation for intent classification with off-the-shelf large language models. arXiv preprint arXiv:2204.01959.
  22. Timo Schick and Hinrich Schütze. 2020. It’s not just size that matters: Small language models are also few-shot learners. arXiv preprint arXiv:2009.07118.
  23. Self-diagnosis and self-debiasing: A proposal for reducing corpus-based bias in nlp. Transactions of the Association for Computational Linguistics, 9:1408–1424.
  24. Nearest neighbor zero-shot inference. arXiv preprint arXiv:2205.13792.
  25. Distilling reasoning capabilities into smaller language models. In Findings of the Association for Computational Linguistics: ACL 2023, pages 7059–7073.
  26. Contrastive distillation on intermediate representations for language model compression. arXiv preprint arXiv:2009.14167.
  27. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
  28. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  29. Jesper E Van Engelen and Holger H Hoos. 2020. A survey on semi-supervised learning. Machine learning, 109(2):373–440.
  30. Pesco: Prompt-enhanced self contrastive learning for zero-shot text classification.
  31. Few-shot text classification with triplet networks, data augmentation, and curriculum learning 2021. arXiv preprint arXiv:2103.07552.
  32. Unsupervised deep embedding for clustering analysis. In International conference on machine learning, pages 478–487. PMLR.
  33. Zerogen: Efficient zero-shot learning via dataset generation. arXiv preprint arXiv:2202.07922.
  34. Gpt3mix: Leveraging large-scale language models for text augmentation. arXiv preprint arXiv:2104.08826.
  35. Weakly-supervised text classification based on keyword graph.
  36. Long-tailed extreme multi-label text classification by the retrieval of generated pseudo label descriptions. In Findings of the Association for Computational Linguistics: EACL 2023, pages 1092–1106, Dubrovnik, Croatia. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Ruohong Zhang (11 papers)
  2. Yau-Shian Wang (13 papers)
  3. Yiming Yang (151 papers)
Citations (8)