Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback (2210.12329v1)

Published 22 Oct 2022 in cs.CL and cs.AI

Abstract: Recently, dataset-generation-based zero-shot learning has shown promising results by training a task-specific model with a dataset synthesized from large pre-trained LLMs (PLMs). The final task-specific model often achieves compatible or even better performance than PLMs under the zero-shot setting, with orders of magnitude fewer parameters. However, synthetic datasets have their drawbacks. They have long been suffering from low-quality issues (e.g., low informativeness and redundancy). This explains why the massive synthetic data does not lead to better performance -- a scenario we would expect in the human-labeled data. To improve the quality of dataset synthesis, we propose a progressive zero-shot dataset generation framework, ProGen, which leverages the feedback from the task-specific model to guide the generation of new training data via in-context examples. Extensive experiments on five text classification datasets demonstrate the effectiveness of the proposed approach. We also show ProGen achieves on-par or superior performance with only 1\% synthetic dataset size compared to baseline methods without in-context feedback.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Jiacheng Ye (21 papers)
  2. Jiahui Gao (25 papers)
  3. Jiangtao Feng (24 papers)
  4. Zhiyong Wu (171 papers)
  5. Tao Yu (282 papers)
  6. Lingpeng Kong (134 papers)
Citations (56)