Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TriSum: Learning Summarization Ability from Large Language Models with Structured Rationale (2403.10351v1)

Published 15 Mar 2024 in cs.CL

Abstract: The advent of LLMs has significantly advanced natural language processing tasks like text summarization. However, their large size and computational demands, coupled with privacy concerns in data transmission, limit their use in resource-constrained and privacy-centric settings. To overcome this, we introduce TriSum, a framework for distilling LLMs' text summarization abilities into a compact, local model. Initially, LLMs extract a set of aspect-triple rationales and summaries, which are refined using a dual-scoring method for quality. Next, a smaller local model is trained with these tasks, employing a curriculum learning strategy that evolves from simple to complex tasks. Our method enhances local model performance on various benchmarks (CNN/DailyMail, XSum, and ClinicalTrial), outperforming baselines by 4.5%, 8.5%, and 7.4%, respectively. It also improves interpretability by providing insights into the summarization rationale.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. PLATO-2: Towards building an open-domain chatbot via curriculum learning. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 2513–2525, Online. Association for Computational Linguistics.
  2. Scheduled sampling for sequence prediction with recurrent neural networks. Advances in neural information processing systems, 28.
  3. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning, pages 41–48.
  4. Latent dirichlet allocation. Journal of machine Learning research, 3(Jan):993–1022.
  5. Large language models in machine translation.
  6. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  7. Distilling knowledge learned in bert for text generation. arXiv preprint arXiv:1911.03829.
  8. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
  9. Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608.
  10. GSum: A general framework for guided neural abstractive summarization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4830–4842, Online. Association for Computational Linguistics.
  11. GLM: General language model pretraining with autoregressive blank infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 320–335, Dublin, Ireland. Association for Computational Linguistics.
  12. Trueteacher: Learning factual consistency evaluation with large language models.
  13. Guy Hacohen and Daphna Weinshall. 2019. On the power of curriculum learning in training deep networks. In International conference on machine learning, pages 2535–2544. PMLR.
  14. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.
  15. Large language models are reasoning teachers.
  16. Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes.
  17. Tinybert: Distilling bert for natural language understanding. arXiv preprint arXiv:1909.10351.
  18. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.
  19. Knowledge distillation on extractive summarization. In 2020 IEEE Third International Conference on Artificial Intelligence and Knowledge Engineering (AIKE), pages 71–76. IEEE.
  20. Yang Liu. 2019. Text summarization with pretrained encoders. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3730–3740, Hong Kong, China. Association for Computational Linguistics.
  21. Yang Liu and Mirella Lapata. 2019. Text summarization with pretrained encoders. arXiv preprint arXiv:1908.08345.
  22. On learning to summarize with large language models as references. arXiv preprint arXiv:2305.14239.
  23. Yixin Liu and Pengfei Liu. 2021. SimCLS: A simple framework for contrastive learning of abstractive summarization. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 1065–1072, Online. Association for Computational Linguistics.
  24. BRIO: Bringing order to abstractive summarization. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2890–2903, Dublin, Ireland. Association for Computational Linguistics.
  25. Teaching small language models to reason.
  26. Factscore: Fine-grained atomic evaluation of factual precision in long form text generation.
  27. Pre-training a BERT with curriculum learning by increasing block-size of input text. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 989–996, Held Online. INCOMA Ltd.
  28. Abstractive text summarization using sequence-to-sequence rnns and beyond. arXiv preprint arXiv:1602.06023.
  29. Don’t give me the details, just the summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization. ArXiv, abs.
  30. OpenAI. 2023. Gpt-4 technical report.
  31. Introduction to the special issue on summarization. Computational Linguistics, 28(4):399–408.
  32. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551.
  33. " why should i trust you?" explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144.
  34. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.
  35. Sam Shleifer and Alexander M. Rush. 2020. Pre-trained summarization distillation.
  36. Distilling reasoning capabilities into smaller language models.
  37. Energy and policy considerations for deep learning in NLP. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3645–3650, Florence, Italy. Association for Computational Linguistics.
  38. Distilling task-specific knowledge from bert into simple neural networks. arXiv preprint arXiv:1903.12136.
  39. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  40. Attention is all you need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
  41. Attention is all you need. Advances in neural information processing systems, 30.
  42. Pinto: Faithful language reasoning using prompt-generated rationales. arXiv preprint arXiv:2211.01562.
  43. Want to reduce labeling cost? GPT-3 can help. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4195–4205, Punta Cana, Dominican Republic. Association for Computational Linguistics.
  44. AnyPredict: Foundation model for tabular prediction. arXiv preprint arXiv:2305.12081.
  45. Chain-of-thought prompting elicits reasoning in large language models.
  46. Curriculum learning for natural language understanding. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6095–6104, Online. Association for Computational Linguistics.
  47. Sequence level contrastive learning for text summarization.
  48. End-to-end open-domain question answering with BERTserini. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), pages 72–77, Minneapolis, Minnesota. Association for Computational Linguistics.
  49. Exploring the limits of chatgpt for query or aspect-based text summarization.
  50. Reclor: A reading comprehension dataset requiring logical reasoning. arXiv preprint arXiv:2002.04326.
  51. Big bird: Transformers for longer sequences.
  52. Omar Zaidan and Jason Eisner. 2008. Modeling annotators: A generative approach to learning from annotator rationales. In Proceedings of the 2008 conference on Empirical methods in natural language processing, pages 31–40.
  53. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In International Conference on Machine Learning, pages 11328–11339. PMLR.
  54. Attention temperature matters in abstractive summarization distillation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 127–141, Dublin, Ireland. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Pengcheng Jiang (15 papers)
  2. Cao Xiao (84 papers)
  3. Zifeng Wang (78 papers)
  4. Parminder Bhatia (50 papers)
  5. Jimeng Sun (181 papers)
  6. Jiawei Han (263 papers)
Citations (7)