Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Information-Theoretic Distillation for Reference-less Summarization (2403.13780v2)

Published 20 Mar 2024 in cs.CL and cs.AI

Abstract: The current winning recipe for automatic summarization is using proprietary large-scale LLMs such as ChatGPT as is, or imitation learning from them as teacher models. While increasingly ubiquitous dependence on such large-scale LLMs is convenient, there remains an important question of whether small-scale models could have achieved competitive results, if we were to seek an alternative learning method -- that allows for a more cost-efficient, controllable, yet powerful summarizer. We present InfoSumm, a novel framework to distill a powerful summarizer based on the information-theoretic objective for summarization, without relying on either the LLM's capability or human-written references. To achieve this, we first propose a novel formulation of the desiderata of summarization (saliency, faithfulness and brevity) through the lens of mutual information between the original document and the summary. Based on this formulation, we start off from Pythia-2.8B as the teacher model, which is not yet capable of summarization, then self-train the model to optimize for the information-centric measures of ideal summaries. Distilling from the improved teacher, we arrive at a compact but powerful summarizer with only 568M parameters that performs competitively against ChatGPT, without ever relying on ChatGPT's capabilities. Extensive analysis demonstrates that our approach outperforms in-domain supervised models in human evaluation, let alone state-of-the-art unsupervised methods, and wins over ChatGPT in controllable summarization.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (62)
  1. Thinking fast and slow with deep learning and tree search, 2017.
  2. SEQ^3: Differentiable sequence-to-sequence-to-sequence autoencoder for unsupervised abstractive sentence compression. In Jill Burstein, Christy Doran, and Thamar Solorio (eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp.  673–681, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1071. URL https://aclanthology.org/N19-1071.
  3. Pythia: A suite for analyzing large language models across training and scaling, 2023.
  4. Plasma: Making small language models better procedural knowledge models for (counterfactual) planning, 2023.
  5. A comprehensive evaluation of constrained text generation for large language models, 2023.
  6. Towards improving faithfulness in abstractive summarization. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=9Hjh0tMT1pm.
  7. Learning to maximize mutual information for dynamic feature selection. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023.
  8. Tinystories: How small can language models be and still speak coherent english?, 2023.
  9. Improving zero and few-shot abstractive summarization with intermediate fine-tuning and data augmentation. In Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, and Yichao Zhou (eds.), Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.  704–717, Online, June 2021a. Association for Computational Linguistics. doi: 10.18653/v1/2021.naacl-main.57. URL https://aclanthology.org/2021.naacl-main.57.
  10. Summeval: Re-evaluating summarization evaluation, 2021b.
  11. Controllable abstractive summarization. In Alexandra Birch, Andrew Finch, Thang Luong, Graham Neubig, and Yusuke Oda (eds.), Proceedings of the 2nd Workshop on Neural Machine Translation and Generation, pp.  45–54, Melbourne, Australia, July 2018. Association for Computational Linguistics. doi: 10.18653/v1/W18-2706. URL https://aclanthology.org/W18-2706.
  12. Unsupervised sentence compression using denoising auto-encoders. In Anna Korhonen and Ivan Titov (eds.), Proceedings of the 22nd Conference on Computational Natural Language Learning, pp.  413–422, Brussels, Belgium, October 2018. Association for Computational Linguistics. doi: 10.18653/v1/K18-1040. URL https://aclanthology.org/K18-1040.
  13. News summarization and evaluation in the era of gpt-3, 2023.
  14. Maarten Grootendorst. Keybert: Minimal keyword extraction with bert., 2020. URL https://doi.org/10.5281/zenodo.4461265.
  15. The false promise of imitating proprietary llms, 2023.
  16. Textbooks are all you need, 2023.
  17. Language models can teach themselves to program better, 2023.
  18. CTRLsum: Towards generic controllable text summarization. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (eds.), Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp.  5879–5915, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.emnlp-main.396. URL https://aclanthology.org/2022.emnlp-main.396.
  19. Geoffrey E Hinton. Training products of experts by minimizing contrastive divergence. Neural computation, 14(8):1771–1800, 2002.
  20. Teaching language models to hallucinate less with synthetic tasks, 2023.
  21. Maieutic prompting: Logically consistent reasoning with recursive explanations, 2022.
  22. Impossible distillation: from low-quality model to high-quality dataset & model for summarization and paraphrasing, 2023.
  23. Ctrl: A conditional transformer language model for controllable generation, 2019.
  24. Abstractive summarization of reddit posts with multi-level memory networks, 2019.
  25. Soda: Million-scale dialogue distillation with social commonsense contextualization, 2023.
  26. Mutual information divergence: A unified metric for multimodal generative models. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=wKd2XtSRsjl.
  27. Wikihow: A large scale text summarization dataset, 2018.
  28. Klaus Krippendorff. Computing krippendorff’s alpha-reliability. annenberg school for communication departmental paper 43, 2007.
  29. The summary loop: Learning to write abstractive summaries without examples. In Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault (eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp.  5135–5150, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.460. URL https://aclanthology.org/2020.acl-main.460.
  30. SummaC: Re-visiting NLI-based models for inconsistency detection in summarization. Transactions of the Association for Computational Linguistics, 10:163–177, 2022. doi: 10.1162/tacl˙a˙00453. URL https://aclanthology.org/2022.tacl-1.10.
  31. Textbooks are all you need ii: phi-1.5 technical report, 2023.
  32. Let’s verify step by step, 2023.
  33. DExperts: Decoding-time controlled text generation with experts and anti-experts. In Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (eds.), Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp.  6691–6706, Online, August 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-long.522. URL https://aclanthology.org/2021.acl-long.522.
  34. Vera: A general-purpose plausibility estimation model for commonsense statements, 2023a.
  35. G-eval: Nlg evaluation using gpt-4 with better human alignment, 2023b.
  36. Brio: Bringing order to abstractive summarization, 2022.
  37. Orca: Progressive learning from complex explanation traces of gpt-4, 2023.
  38. Abstractive text summarization using sequence-to-sequence RNNs and beyond. In Stefan Riezler and Yoav Goldberg (eds.), Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, pp.  280–290, Berlin, Germany, August 2016. Association for Computational Linguistics. doi: 10.18653/v1/K16-1028. URL https://aclanthology.org/K16-1028.
  39. Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. In Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii (eds.), Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp.  1797–1807, Brussels, Belgium, October-November 2018. Association for Computational Linguistics. doi: 10.18653/v1/D18-1206. URL https://aclanthology.org/D18-1206.
  40. Vishakh Padmakumar and He He. Unsupervised extractive summarization using pointwise mutual information. In Paola Merlo, Jorg Tiedemann, and Reut Tsarfaty (eds.), Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp.  2505–2512, Online, April 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.eacl-main.213. URL https://aclanthology.org/2021.eacl-main.213.
  41. Language models are unsupervised multitask learners. In OpenAI, 2019. URL https://api.semanticscholar.org/CorpusID:160025533.
  42. Exploring the limits of transfer learning with a unified text-to-text transformer, 2023.
  43. Referee: Reference-free sentence summarization with sharper controllability through symbolic knowledge distillation, 2022.
  44. Mastering the game of go without human knowledge. Nature, 550:354–359, 2017. URL https://api.semanticscholar.org/CorpusID:205261034.
  45. Learning to summarize from human feedback, 2022.
  46. How well do you know your summarization datasets? In Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (eds.), Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp.  3436–3449, Online, August 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.findings-acl.303. URL https://aclanthology.org/2021.findings-acl.303.
  47. Lexical statistics and tipological structures: A measure of lexical richness. Procedia - Social and Behavioral Sciences, 95:447–454, 10 2013. doi: 10.1016/j.sbspro.2013.10.668.
  48. Solving math word problems with process- and outcome-based feedback, 2022.
  49. Fill in the BLANC: Human-free quality estimation of document summaries. In Steffen Eger, Yang Gao, Maxime Peyrard, Wei Zhao, and Eduard Hovy (eds.), Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems, pp.  11–20, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.eval4nlp-1.2. URL https://aclanthology.org/2020.eval4nlp-1.2.
  50. Symbolic knowledge distillation: from general language models to commonsense models. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.  4602–4625, Seattle, United States, July 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.naacl-main.341. URL https://aclanthology.org/2022.naacl-main.341.
  51. Recursively summarizing books with human feedback, 2021.
  52. InheritSumm: A general, versatile and compact summarizer by distilling from GPT. In Houda Bouamor, Juan Pino, and Kalika Bali (eds.), Findings of the Association for Computational Linguistics: EMNLP 2023, pp.  13879–13892, Singapore, December 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.findings-emnlp.927. URL https://aclanthology.org/2023.findings-emnlp.927.
  53. TED: A pretrained unsupervised summarization model with theme modeling and denoising. In Trevor Cohn, Yulan He, and Yang Liu (eds.), Findings of the Association for Computational Linguistics: EMNLP 2020, pp.  1865–1874, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.findings-emnlp.168. URL https://aclanthology.org/2020.findings-emnlp.168.
  54. Metamath: Bootstrap your own mathematical questions for large language models, 2023a.
  55. Large language model as attributed training data generator: A tale of diversity and bias, 2023b.
  56. Mammoth: Building math generalist models through hybrid instruction tuning, 2023.
  57. Star: Bootstrapping reasoning with reasoning, 2022.
  58. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization, 2020a.
  59. Bertscore: Evaluating text generation with bert, 2020b.
  60. Benchmarking large language models for news summarization, 2023a.
  61. Macsum: Controllable summarization with mixed attributes, 2023b.
  62. Make lead bias in your favor: A simple and effective method for news summarization. ArXiv, abs/1912.11602, 2019. URL https://api.semanticscholar.org/CorpusID:209500822.
Citations (2)

Summary

We haven't generated a summary for this paper yet.