Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Generating Training Data with Language Models: Towards Zero-Shot Language Understanding (2202.04538v2)

Published 9 Feb 2022 in cs.CL and cs.LG

Abstract: Pretrained LLMs (PLMs) have demonstrated remarkable performance in various natural language processing tasks: Unidirectional PLMs (e.g., GPT) are well known for their superior text generation capabilities; bidirectional PLMs (e.g., BERT) have been the prominent choice for natural language understanding (NLU) tasks. While both types of models have achieved promising few-shot learning performance, their potential for zero-shot learning has been underexplored. In this paper, we present a simple approach that uses both types of PLMs for fully zero-shot learning of NLU tasks without requiring any task-specific data: A unidirectional PLM generates class-conditioned texts guided by prompts, which are used as the training data for fine-tuning a bidirectional PLM. With quality training data selected based on the generation probability and regularization techniques (label smoothing and temporal ensembling) applied to the fine-tuning stage for better generalization and stability, our approach demonstrates strong performance across seven classification tasks of the GLUE benchmark (e.g., 72.3/73.8 on MNLI-m/mm and 92.8 on SST-2), significantly outperforming zero-shot prompting methods and achieving even comparable results to strong few-shot approaches using 32 training samples per class.

Generating Training Data with LLMs: Towards Zero-Shot Language Understanding

The paper proposes "SuperGen", a novel approach to zero-shot learning for natural language understanding (NLU) tasks by leveraging pretrained LLMs (PLMs) as generators of task-specific training data. Traditional PLM-based techniques often rely on few-shot learning paradigms, requiring some degree of task-specific annotated data to fine-tune models. In contrast, SuperGen aims to eliminate the dependency on such data, generating sufficient and relevant synthetic training examples using only the label set and descriptive prompts of each task.

Methodology and Approach

SuperGen utilizes both unidirectional and bidirectional PLMs in a two-stage process designed to maximize the efficacy of zero-shot learning:

  1. Training Data Generation: A unidirectional PLM acts as a generator, creating synthetic class-conditioned texts using label-descriptive prompts. The generator is employed without any task-specific or cross-task data fine-tuning. The paper demonstrates methods for crafting these prompts to match the linguistic and semantic domain of the task at hand, including text generation strategies for both single-sequence and sequence-pair classification tasks.
  2. Classifier Fine-Tuning: Once data is generated, a bidirectional PLM serves as a classifier to interpret the target NLU task. The fine-tuning process of the classifier incorporates several enhancements to manage label and domain divergences:
    • Quality Training Data Selection: The initial generation process produces more data than necessary, from which only high-probability, high-quality samples are chosen based on log generation probability scores.
    • Regularization Techniques: To increase generalization and robustness of the classifier, strategies such as label smoothing and temporal ensembling are integrated into the training regime. Label smoothing reduces overfitting by making predictions less confident, while temporal ensembling leverages moving averages of predictions over time, reducing sensitivity to noise and facilitating effective learning from the synthetic data.

Experimental Results

SuperGen's performance was evaluated on seven GLUE benchmark tasks, showcasing significant improvements over existing zero-shot prompting methods. Notably, SuperGen achieved comparable performances to state-of-the-art few-shot methods, despite operating under zero-shot constraints. Among its key achievements:

  • On tasks such as SST-2 and MNLI, SuperGen exhibited performance metrics closely aligning or surpassing few-shot setups that utilized 32 samples per class.
  • SuperGen consistently demonstrated smaller variance in performance across different random seeds, marking a substantial gain in stability often lacking in few-shot paradigms.

The paper further explores the ablation of key components of SuperGen, evidencing the critical roles played by data selection and regularization techniques in optimizing the performance of zero-shot learning. Moreover, comparative studies using different PLM architectures (e.g., GPT-2 and RoBERTa variants) confirmed the approach's adaptability and revealed insights into the size and pretraining corpus choices on training data generation efficacy.

Implications and Future Direction

SuperGen paves the way for NLU systems capable of handling diverse tasks without requiring extensive dataset-specific annotations, thus aligning more closely with human-like task adaptability. Its framework could feasibly support a wide range of applications requiring prompt convergence and agility to novel tasks without large-scale data curation.

While SuperGen establishes a robust foundation for zero-shot NLU, challenges persist in standardizing prompt patterns across tasks and further mitigating domain disparities between synthetic and real-world data. Potential avenues for advancement include the application of advanced quality control algorithms during data selection and leveraging even more comprehensive and generalized LLMs as generators.

Overall, SuperGen represents a pertinent step forward in circumventing the limitations of traditional data-heavy model training, exemplifying how AI can be equipped for scalable, task-general language understanding capabilities in data-scarce settings.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yu Meng (92 papers)
  2. Jiaxin Huang (48 papers)
  3. Yu Zhang (1399 papers)
  4. Jiawei Han (263 papers)
Citations (196)