Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

KEST: Kernel Distance Based Efficient Self-Training for Improving Controllable Text Generation (2306.10414v1)

Published 17 Jun 2023 in cs.CL

Abstract: Self-training (ST) has come to fruition in language understanding tasks by producing pseudo labels, which reduces the labeling bottleneck of LLM fine-tuning. Nevertheless, in facilitating semi-supervised controllable language generation, ST faces two key challenges. First, augmented by self-generated pseudo text, generation models tend to over-exploit the previously learned text distribution, suffering from mode collapse and poor generation diversity. Second, generating pseudo text in each iteration is time-consuming, severely decelerating the training process. In this work, we propose KEST, a novel and efficient self-training framework to handle these problems. KEST utilizes a kernel-based loss, rather than standard cross entropy, to learn from the soft pseudo text produced by a shared non-autoregressive generator. We demonstrate both theoretically and empirically that KEST can benefit from more diverse pseudo text in an efficient manner, which allows not only refining and exploiting the previously fitted distribution but also enhanced exploration towards a larger potential text space, providing a guarantee of improved performance. Experiments on three controllable generation tasks demonstrate that KEST significantly improves control accuracy while maintaining comparable text fluency and generation diversity against several strong baselines.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. Self-training with few-shot rationalization. In EMNLP 2021, pages 10702–10712, November 2021.
  2. Semi-supervised learning. IEEE Transactions on Neural Networks, 20, 2006.
  3. Revisiting self-training for few-shot learning of language model. In EMNLP 2021, pages 9125–9135, November 2021.
  4. Hierarchical prosody modeling for non-autoregressive speech synthesis. In SLT 2021, pages 446–453, 2021.
  5. Plug and play language models: A simple approach to controlled text generation. In ICLR, 2020.
  6. BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL 2019, pages 4171–4186, June 2019.
  7. Unified language model pre-training for natural language understanding and generation. In NeurIPS 2019, December 2019.
  8. Self-training improves pre-training for natural language understanding. In NAACL 2021, pages 5408–5418, Online, June 2021.
  9. RealToxicityPrompts: Evaluating neural toxic degeneration in language models. In Findings of EMNLP 2020, pages 3356–3369, Online, November 2020.
  10. Semi-supervised learning by entropy minimization. In NeurIPS, volume 17, 2004.
  11. A kernel two-sample test. The Journal of Machine Learning Research, 13(1):723–773, 2012.
  12. Non-autoregressive neural machine translation. In ICLR, 2018.
  13. Don’t stop pretraining: Adapt language models to domains and tasks. In ACL 2020, pages 8342–8360, Online, July 2020.
  14. Deep self-learning from noisy labels. In ICCV 2019, pages 5138–5147, 2019.
  15. Revisiting self-training for neural sequence generation. In ICLR, 2020.
  16. Self-training sampling with monolingual data uncertainty for neural machine translation. In ACL 2021, pages 2840–2850, 2021.
  17. Self-training for end-to-end speech recognition. In ICASSP 2020, pages 7084–7088. IEEE, 2020.
  18. Ctrl: A conditional transformer language model for controllable generation. ArXiv, abs/1909.05858, 2019.
  19. Gedi: Generative discriminator guided sequence generation. In Findings of EMNLP 2021, pages 4929–4952, 2021.
  20. A diversity-promoting objective function for neural conversation models. In NAACL 2016, pages 110–119, June 2016.
  21. Task-adaptive pre-training and self-training are complementary for natural language understanding. In Findings of EMNLP 2021, pages 1006–1015, 2021.
  22. Roberta: A robustly optimized bert pretraining approach. ArXiv, abs/1907.11692, 2019.
  23. DExperts: Decoding-time controlled text generation with experts and anti-experts. In ACL 2021, pages 6691–6706, Online, August 2021.
  24. Learning non-autoregressive models from search for unsupervised sentence summarization. In ACL 2022, pages 7916–7929, 2022.
  25. Decoupled weight decay regularization. In ICLR, 2019.
  26. Flowseq: Non-autoregressive conditional sequence generation with generative flow. In EMNLP-IJCNLP, pages 4282–4292, 2019.
  27. Learning word vectors for sentiment analysis. In ACL 2011, pages 142–150, 2011.
  28. Uncertainty-aware self-training for few-shot text classification. In NeurIPS 2020, Online, 2020.
  29. Improved noisy student training for automatic speech recognition. In Interspeech 2020. ISCA, Oct 2020.
  30. Controllable natural language generation with contrastive prefixes. In Findings of ACL 2022, pages 2912–2924, May 2022.
  31. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  32. Exploring the limits of transfer learning with a unified text-to-text transformer. JMLR, 21(140):1–67, 2020.
  33. Generating datasets with pretrained language models. In EMNLP 2021, pages 6943–6951, 2021.
  34. Henry Scudder. Probability of error of some adaptive pattern-recognition machines. IEEE Transactions on Information Theory, 11(3):363–371, 1965.
  35. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. NeurIPS, 33:596–608, 2020.
  36. Attention is all you need. In NeurIPS, pages 5998–6008, 2017.
  37. STraTA: Self-training with task augmentation for better few-shot learning. In EMNLP 2021, pages 5715–5731, November 2021.
  38. Adaptive self-training for few-shot neural sequence labeling. arXiv preprint arXiv:2010.03680, 2020.
  39. Crest: A class-rebalancing self-training framework for imbalanced semi-supervised learning. In CVPR, pages 10857–10866, 2021.
  40. Theoretical analysis of self-training with deep networks on unlabeled data. In ICLR, 2021.
  41. Transformers: State-of-the-art natural language processing. In EMNLP 2020: System Demonstrations, pages 38–45, Online, October 2020.
  42. Self-training with noisy student improves imagenet classification. CVPR 2020, pages 10684–10695, 2020.
  43. Generative data augmentation for commonsense reasoning. In EMNLP Findings, pages 1008–1025, 2020.
  44. Unified detoxifying and debiasing in language generation via inference-time adaptive optimization. In ICLR, 2023.
  45. David Yarowsky. Unsupervised word sense disambiguation rivaling supervised methods. In ACL, pages 189–196, 1995.
  46. Learning and evaluating general linguistic intelligence. arXiv preprint arXiv:1901.11373, 2019.
  47. Exploiting source-side monolingual data in neural machine translation. In EMNLP, pages 1535–1545, 2016.
  48. Character-level convolutional networks for text classification. In NIPS, 2015.
  49. Revisiting few-sample BERT fine-tuning. In ICLR, 2021.
  50. How unlabeled data improve generalization in self-training? a one-hidden-layer theoretical analysis. In ICLR, 2022.
  51. Texygen: A benchmarking platform for text generation models. In SIGIR 2018, page 1097–1100, 2018.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yuxi Feng (4 papers)
  2. Xiaoyuan Yi (42 papers)
  3. Laks V. S. Lakshmanan (58 papers)
  4. Xing Xie (220 papers)
Citations (1)