Generation-driven Contrastive Self-training for Zero-shot Text Classification with Instruction-following LLM (2304.11872v2)
Abstract: The remarkable performance of LLMs in zero-shot language understanding has garnered significant attention. However, employing LLMs for large-scale inference or domain-specific fine-tuning requires immense computational resources due to their substantial model size. To overcome these limitations, we introduce a novel method, namely GenCo, which leverages the strong generative power of LLMs to assist in training a smaller and more adaptable LLM. In our method, an LLM plays an important role in the self-training loop of a smaller model in two important ways. Firstly, the LLM is used to augment each input instance with a variety of possible continuations, enriching its semantic context for better understanding. Secondly, it helps crafting additional high-quality training pairs, by rewriting input texts conditioned on predicted labels. This ensures the generated texts are highly relevant to the predicted labels, alleviating the prediction error during pseudo-labeling, while reducing the dependency on large volumes of unlabeled text. In our experiments, GenCo outperforms previous state-of-the-art methods when only limited ($<5\%$ of original) in-domain text data is available. Notably, our approach surpasses the performance of Alpaca-7B with human prompts, highlighting the potential of leveraging LLM for self-training.
- Language models are few-shot learners. arXiv preprint arXiv:2005.14165.
- Olivier Chapelle and Alexander Zien. 2005. Semi-supervised classification by low density separation. In International workshop on artificial intelligence and statistics, pages 57–64. PMLR.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality. See https://vicuna. lmsys. org (accessed 14 April 2023).
- Celda: Leveraging black-box language model as enhanced classifier without labels. arXiv preprint arXiv:2306.02693.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Beyond prompting: Making pre-trained language models better zero-shot learners by clustering representations. arXiv preprint arXiv:2210.16637.
- Zerogen+++: Self-guided high-quality data generation in efficient zero-shot learning. arXiv preprint arXiv:2205.12679.
- Simcse: Simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821.
- Zero-shot text classification with self-training. arXiv preprint arXiv:2210.17541.
- Yves Grandvalet and Yoshua Bengio. 2004. Semi-supervised learning by entropy minimization. Advances in neural information processing systems, 17.
- Yves Grandvalet and Yoshua Bengio. 2006. Entropy regularization.
- Tess: Zero-shot classification via textual similarity comparison with prompting using sentence encoder. arXiv preprint arXiv:2212.10391.
- Unnatural instructions: Tuning language models with (almost) no human labor. arXiv preprint arXiv:2212.09689.
- Supervised contrastive learning. In Advances in Neural Information Processing Systems, pages 18661–18673. Curran Associates, Inc.
- Dong-Hyun Lee et al. 2013. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Workshop on challenges in representation learning, ICML, volume 3, page 896.
- Generating training data with language models: Towards zero-shot language understanding. arXiv preprint arXiv:2202.04538.
- Text classification using label names only: A language model self-training approach. arXiv preprint arXiv:2010.07245.
- When does label smoothing help? Advances in neural information processing systems, 32.
- Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155.
- Data augmentation for intent classification with off-the-shelf large language models. arXiv preprint arXiv:2204.01959.
- Timo Schick and Hinrich Schütze. 2020. It’s not just size that matters: Small language models are also few-shot learners. arXiv preprint arXiv:2009.07118.
- Self-diagnosis and self-debiasing: A proposal for reducing corpus-based bias in nlp. Transactions of the Association for Computational Linguistics, 9:1408–1424.
- Nearest neighbor zero-shot inference. arXiv preprint arXiv:2205.13792.
- Distilling reasoning capabilities into smaller language models. In Findings of the Association for Computational Linguistics: ACL 2023, pages 7059–7073.
- Contrastive distillation on intermediate representations for language model compression. arXiv preprint arXiv:2009.14167.
- Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- Jesper E Van Engelen and Holger H Hoos. 2020. A survey on semi-supervised learning. Machine learning, 109(2):373–440.
- Pesco: Prompt-enhanced self contrastive learning for zero-shot text classification.
- Few-shot text classification with triplet networks, data augmentation, and curriculum learning 2021. arXiv preprint arXiv:2103.07552.
- Unsupervised deep embedding for clustering analysis. In International conference on machine learning, pages 478–487. PMLR.
- Zerogen: Efficient zero-shot learning via dataset generation. arXiv preprint arXiv:2202.07922.
- Gpt3mix: Leveraging large-scale language models for text augmentation. arXiv preprint arXiv:2104.08826.
- Weakly-supervised text classification based on keyword graph.
- Long-tailed extreme multi-label text classification by the retrieval of generated pseudo label descriptions. In Findings of the Association for Computational Linguistics: EACL 2023, pages 1092–1106, Dubrovnik, Croatia. Association for Computational Linguistics.
- Ruohong Zhang (11 papers)
- Yau-Shian Wang (13 papers)
- Yiming Yang (151 papers)