Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Inverse Reinforcement Learning for Text Summarization (2212.09917v2)

Published 19 Dec 2022 in cs.CL

Abstract: We introduce inverse reinforcement learning (IRL) as an effective paradigm for training abstractive summarization models, imitating human summarization behaviors. Our IRL model estimates the reward function using a suite of important sub-rewards for summarization and concurrently optimizes the policy network. Experimental results across datasets in different domains (CNN/DailyMail and WikiHow) and various model sizes (BART-base and BART-large) demonstrate the superiority of our proposed IRL model for summarization over MLE and RL baselines. The resulting summaries exhibit greater similarity to human-crafted gold references, outperforming MLE and RL baselines on metrics such as ROUGE, coverage, novelty, compression ratio, factuality, and human evaluations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Saurabh Arora and Prashant Doshi. 2021. A survey of inverse reinforcement learning: Challenges, methods and progress. Artificial Intelligence, 297:103500.
  2. Learning to understand goal specifications by modelling reward. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net.
  3. Gosum: Extractive summarization of long documents by reinforcement learning and graph organized discourse state. arXiv preprint arXiv:2211.10247.
  4. Better rewards yield better summaries: Learning to summarise without references. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3110–3120, Hong Kong, China. Association for Computational Linguistics.
  5. Hallucinated but factual! inspecting the factuality of hallucinations in abstractive summarization. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3340–3354.
  6. Cdevalsumm: An empirical study of cross-dataset evaluation for neural summarization systems. arXiv preprint arXiv:2010.05139.
  7. Nan Ding and Radu Soricut. 2017. Cold-start reinforcement learning with softmax policy gradient. Advances in Neural Information Processing Systems, 30.
  8. BanditSum: Extractive summarization as a contextual bandit. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3739–3748, Brussels, Belgium. Association for Computational Linguistics.
  9. Faithful to the document or to the world? mitigating hallucinations via entity-linked knowledge in abstractive summarization. arXiv preprint arXiv:2204.13761.
  10. GSum: A general framework for guided neural abstractive summarization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4830–4842, Online. Association for Computational Linguistics.
  11. AnswerSumm: A manually-curated dataset and pipeline for answer summarization. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2508–2520, Seattle, United States. Association for Computational Linguistics.
  12. How helpful is inverse reinforcement learning for table-to-text generation? In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 71–79.
  13. Sayan Ghosh and Shashank Srivastava. 2021. Mapping language to programs using multiple reward components with inverse reinforcement learning. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 1449–1462.
  14. Newsroom: A dataset of 1.3 million summaries with diverse extractive strategies. arXiv preprint arXiv:1804.11283.
  15. Teacher forcing recovers reward functions for text generation. arXiv preprint arXiv:2210.08708.
  16. Mahnaz Koupaee and William Yang Wang. 2018. Wikihow: A large scale text summarization dataset. arXiv preprint arXiv:1810.09305.
  17. Evaluating the factual consistency of abstractive text summarization. arXiv preprint arXiv:1910.12840.
  18. Faithful or extractive? on mitigating the faithfulness-abstractiveness trade-off in abstractive summarization. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1410–1421, Dublin, Ireland. Association for Computational Linguistics.
  19. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880, Online. Association for Computational Linguistics.
  20. Deep reinforcement learning with distributional semantic rewards for abstractive summarization. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 6038–6044, Hong Kong, China. Association for Computational Linguistics.
  21. Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
  22. On faithfulness and factuality in abstractive summarization. arXiv preprint arXiv:2005.00661.
  23. Summarunner: A recurrent neural network based sequence model for extractive summarization of documents. In Thirty-first AAAI conference on artificial intelligence.
  24. Ranking sentences for extractive summarization with reinforcement learning. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1747–1759, New Orleans, Louisiana. Association for Computational Linguistics.
  25. Planning with learned entity prompts for abstractive summarization. Transactions of the Association for Computational Linguistics, 9:1475–1492.
  26. Ramakanth Pasunuru and Mohit Bansal. 2018. Multi-reward reinforced summarization with saliency and entailment. arXiv preprint arXiv:1804.06451.
  27. A deep reinforced model for abstractive summarization. In International Conference on Learning Representations.
  28. Sequence level training with recurrent neural networks. In 4th International Conference on Learning Representations, ICLR 2016.
  29. Self-critical sequence training for image captioning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7008–7024.
  30. A neural attention model for abstractive sentence summarization. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 379–389, Lisbon, Portugal. Association for Computational Linguistics.
  31. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
  32. Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1073–1083, Vancouver, Canada. Association for Computational Linguistics.
  33. Toward diverse text generation with inverse reinforcement learning. arXiv preprint arXiv:1804.11258.
  34. David Wan and Mohit Bansal. 2022. Factpegasus: Factuality-aware pre-training and fine-tuning for abstractive summarization. arXiv preprint arXiv:2205.07830.
  35. Chaojun Wang and Rico Sennrich. 2020. On exposure bias, hallucination and domain shift in neural machine translation. arXiv preprint arXiv:2005.03642.
  36. No metrics are perfect: Adversarial reward learning for visual storytelling. arXiv preprint arXiv:1804.09160.
  37. Ronald J Williams and David Zipser. 1989. A learning algorithm for continually running fully recurrent neural networks. Neural computation, 1(2):270–280.
  38. Reinforcement learning for abstractive question summarization with question-aware semantic rewards. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 249–255, Online. Association for Computational Linguistics.
  39. Hegel: Hypergraph transformer for long document summarization. arXiv preprint arXiv:2210.04126.
  40. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675.
  41. Objective-aware traffic simulation via inverse reinforcement learning. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pages 3771–3777. International Joint Conferences on Artificial Intelligence Organization. Main Track.
  42. Extractive summarization as text matching. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6197–6208, Online. Association for Computational Linguistics.
  43. Maximum entropy inverse reinforcement learning. In Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, AAAI 2008, Chicago, Illinois, USA, July 13-17, 2008, pages 1433–1438. AAAI Press.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Yu Fu (86 papers)
  2. Deyi Xiong (103 papers)
  3. Yue Dong (61 papers)
Citations (3)