Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Data Boost: Text Data Augmentation Through Reinforcement Learning Guided Conditional Generation (2012.02952v1)

Published 5 Dec 2020 in cs.CL and cs.LG

Abstract: Data augmentation is proven to be effective in many NLU tasks, especially for those suffering from data scarcity. In this paper, we present a powerful and easy to deploy text augmentation framework, Data Boost, which augments data through reinforcement learning guided conditional generation. We evaluate Data Boost on three diverse text classification tasks under five different classifier architectures. The result shows that Data Boost can boost the performance of classifiers especially in low-resource data scenarios. For instance, Data Boost improves F1 for the three tasks by 8.7% on average when given only 10% of the whole data for training. We also compare Data Boost with six prior text augmentation methods. Through human evaluations (N=178), we confirm that Data Boost augmentation has comparable quality as the original data with respect to readability and class consistency.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Ruibo Liu (42 papers)
  2. Guangxuan Xu (13 papers)
  3. Chenyan Jia (11 papers)
  4. Weicheng Ma (22 papers)
  5. Lili Wang (133 papers)
  6. Soroush Vosoughi (90 papers)
Citations (98)