Papers
Topics
Authors
Recent
Search
2000 character limit reached

Semi-Offline Reinforcement Learning for Optimized Text Generation

Published 16 Jun 2023 in cs.LG, cs.AI, and cs.CL | (2306.09712v1)

Abstract: In reinforcement learning (RL), there are two major settings for interacting with the environment: online and offline. Online methods explore the environment at significant time cost, and offline methods efficiently obtain reward signals by sacrificing exploration capability. We propose semi-offline RL, a novel paradigm that smoothly transits from offline to online settings, balances exploration capability and training cost, and provides a theoretical foundation for comparing different RL settings. Based on the semi-offline formulation, we present the RL setting that is optimal in terms of optimization cost, asymptotic error, and overfitting error bound. Extensive experiments show that our semi-offline approach is efficient and yields comparable or often better performance compared with state-of-the-art methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. An actor-critic algorithm for sequence prediction. In ICLR, 2016.
  2. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In IEEvaluation@ACL, 2005.
  3. Scheduled sampling for sequence prediction with recurrent neural networks. NeurIPS, 28, 2015.
  4. Personalized chit-chat generation for recommendation using external chat corpora. KDD, 2022.
  5. Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL, 2019.
  6. Cold-start reinforcement learning with softmax policy gradient. In NeurIPS, 2017.
  7. Reinforcement routing on proximity graph for efficient recommendation. TOIS, 41:1–27, 2022.
  8. On overfitting and asymptotic bias in batch reinforcement learning with partial observability. JAIR, 65:1–30, 2019.
  9. Samsum corpus: A human-annotated dialogue dataset for abstractive summarization. EMNLP-IJCNLP 2019, pp.  70, 2019.
  10. Assessing the factual accuracy of generated text. In KDD, pp.  166–175, 2019.
  11. Teaching machines to read and comprehend. NeurIPS, 28, 2015.
  12. Generating multiple-length summaries via reinforcement learning for unsupervised sentence summarization. In EMNLP, 2022.
  13. Way off-policy batch deep reinforcement learning of implicit human preferences in dialog. arXiv:1907.00456, 2019.
  14. Actor-critic algorithms. NeurIPS, 12, 1999.
  15. Summac: Re-visiting nli-based models for inconsistency detection in summarization. ACL, 10:163–177, 2021.
  16. CodeRL: Mastering code generation through pretrained models and deep reinforcement learning. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K. (eds.), NeurIPS, 2022.
  17. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In ACL, pp.  7871–7880, 2020.
  18. Deep reinforcement learning with distributional semantic rewards for abstractive summarization. In EMNLP, pp.  6038–6044, 2019.
  19. Automatic evaluation of summaries using n-gram co-occurrence statistics. In HLT-NAACL, pp.  150–157, 2003.
  20. Roberta: A robustly optimized bert pretraining approach. arXiv:1907.11692, 2019.
  21. Brio: Bringing order to abstractive summarization. In ACL, pp.  2890–2903, 2022.
  22. Scheduled sampling for transformers. ACL 2019, pp.  351, 2019.
  23. Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. In EMNLP, pp.  1797–1807, 2018.
  24. Reward augmented maximum likelihood for neural structured prediction. NeurIPS, 29, 2016.
  25. Training language models to follow instructions with human feedback. NeurIPS, 35:27730–27744, 2022.
  26. Text generation by learning from demonstrations. In ICLR, 2020.
  27. Bleu: a method for automatic evaluation of machine translation. In ACL, pp.  311–318, 2002.
  28. A deep reinforced model for abstractive summarization. In ICLR, 2018.
  29. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140):1–67, 2020.
  30. Squad: 100,000+ questions for machine comprehension of text. In EMNLP, pp.  2383–2392, 2016.
  31. Proximal policy optimization algorithms. arXiv:1707.06347, 2017.
  32. A deep reinforcement learning chatbot. arXiv:1709.02349, 2017.
  33. Offline rl for natural language generation with implicit language q learning. arXiv preprint arXiv:2206.11871, 2022.
  34. Policy gradient methods for reinforcement learning with function approximation. NeurIPS, 12, 1999.
  35. Connecting the dots between mle and rl for sequence generation. ArXiv, abs/1811.09740, 2018.
  36. Generative Language Models for Paragraph-Level Question Generation. In EMNLP, Abu Dhabi, U.A.E., December 2022.
  37. Diverse beam search: Decoding diverse solutions from neural sequence models. ArXiv, abs/1610.02424, 2016.
  38. A reinforcement learning framework for explainable recommendation. ICDM, pp.  587–596, 2018.
  39. Reinforcing pretrained models for generating attractive text advertisements. In KDD, pp.  3697–3707, 2021.
  40. Multi-level recommendation reasoning over knowledge graphs with reinforcement learning. Proceedings of the ACM Web Conference 2022, 2022.
  41. Williams, R. J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3):229–256, 1992.
  42. Reinforcement subgraph reasoning for fake news detection. KDD, 2022.
  43. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In ICML, pp.  11328–11339. PMLR, 2020.
  44. Bertscore: Evaluating text generation with bert. ArXiv, abs/1904.09675, 2019a.
  45. Continuous sign language recognition via reinforcement learning. In ICIP, pp.  285–289. IEEE, 2019b.
  46. Leveraging demonstrations for reinforcement recommendation reasoning over knowledge graphs. SIGIR, 2020.
  47. A survey of large language models. arXiv preprint arXiv:2303.18223, 2023.
  48. Scaling pareto-efficient decision making via offline multi-objective rl. In ICLR, 2023.
Citations (15)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.