Action-Item-Driven Summarization of Long Meeting Transcripts (2312.17581v2)
Abstract: The increased prevalence of online meetings has significantly enhanced the practicality of a model that can automatically generate the summary of a given meeting. This paper introduces a novel and effective approach to automate the generation of meeting summaries. Current approaches to this problem generate general and basic summaries, considering the meeting simply as a long dialogue. However, our novel algorithms can generate abstractive meeting summaries that are driven by the action items contained in the meeting transcript. This is done by recursively generating summaries and employing our action-item extraction algorithm for each section of the meeting in parallel. All of these sectional summaries are then combined and summarized together to create a coherent and action-item-driven summary. In addition, this paper introduces three novel methods for dividing up long transcripts into topic-based sections to improve the time efficiency of our algorithm, as well as to resolve the issue of LLMs forgetting long-term dependencies. Our pipeline achieved a BERTScore of 64.98 across the AMI corpus, which is an approximately 4.98% increase from the current state-of-the-art result produced by a fine-tuned BART (Bidirectional and Auto-Regressive Transformers) model.
- Kevin Clark and Christopher D. Manning. 2016. Improving Coreference Resolution by Learning Entity-Level Distributed Representations. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, 643–653. https://doi.org/10.18653/v1/P16-1061
- Automatic Rephrasing of Transcripts-based Action Items. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics, Online, 2862–2873. https://doi.org/10.18653/v1/2021.findings-acl.253
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.
- A Survey on Long Text Modeling with Transformers. http://arxiv.org/abs/2302.14502 arXiv:2302.14502 [cs].
- SummEval: Re-evaluating Summarization Evaluation. Transactions of the Association for Computational Linguistics 9 (April 2021), 391–409. https://doi.org/10.1162/tacl_a_00373
- A Survey on Dialogue Summarization: Recent Advances and New Frontiers. http://arxiv.org/abs/2107.03175 arXiv:2107.03175 [cs].
- Automation of Minutes of Meeting (MoM) using Natural Language Processing (NLP). In 2022 International Conference on Communication, Computing and Internet of Things (IC3IoT). 1–6. https://doi.org/10.1109/IC3IOT53935.2022.9767933
- SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization. In Proceedings of the 2nd Workshop on New Frontiers in Summarization. 70–79. https://doi.org/10.18653/v1/D19-5409 arXiv:1911.12237 [cs].
- Som Gupta and S. K Gupta. 2019. Abstractive summarization: An overview of the state of the art. Expert Systems with Applications 121 (May 2019), 49–65. https://doi.org/10.1016/j.eswa.2018.12.011
- Marti A Hearst. 1997. TextTiling: Segmenting Text into Multi-paragraph Subtopic Passages. Computational Linguistics 23, 1 (1997).
- An Empirical Survey on Long Document Summarization: Datasets, Models, and Metrics. Comput. Surveys 55, 8 (Aug. 2023), 1–35. https://doi.org/10.1145/3545176
- BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 7871–7880. https://doi.org/10.18653/v1/2020.acl-main.703
- Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. (2004).
- RoBERTa: A Robustly Optimized BERT Pretraining Approach. http://arxiv.org/abs/1907.11692 arXiv:1907.11692 [cs].
- The AMI meeting corpus. Int’l. Conf. on Methods and Techniques in Behavioral Research (01 2005).
- Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization. http://arxiv.org/abs/1808.08745 arXiv:1808.08745 [cs].
- Exploring the limits of a base BART for multi-document summarization in the medical domain. (2022).
- Abstractive Meeting Summarization: A Survey. http://arxiv.org/abs/2208.04163 arXiv:2208.04163 [cs].
- Leveraging Pre-trained Checkpoints for Sequence Generation Tasks. Transactions of the Association for Computational Linguistics 8 (Dec. 2020), 264–280. https://doi.org/10.1162/tacl_a_00313 arXiv:1907.12461 [cs].
- Automatic Minuting: A Pipeline Method for Generating Minutes from Multi-Party Meeting Transcripts. (2022).
- Unsupervised Topic Segmentation of Meetings with BERT Embeddings. http://arxiv.org/abs/2106.12978 arXiv:2106.12978 [cs].
- MPNet: Masked and Permuted Pre-training for Language Understanding. http://arxiv.org/abs/2004.09297 arXiv:2004.09297 [cs].
- Attention Is All You Need. http://arxiv.org/abs/1706.03762 arXiv:1706.03762 [cs].
- Recursively Summarizing Books with Human Feedback. http://arxiv.org/abs/2109.10862
- Team Hitachi @ AutoMin 2021: Reference-free Automatic Minuting Pipeline with Argument Structure Construction over Topic-based Summarization. http://arxiv.org/abs/2112.02741
- Exploring the Limits of ChatGPT for Query or Aspect-based Text Summarization. http://arxiv.org/abs/2302.08081 arXiv:2302.08081 [cs].
- BERTScore: Evaluating Text Generation with BERT. http://arxiv.org/abs/1904.09675 arXiv:1904.09675 [cs].
- Summ^N: A Multi-Stage Summarization Framework for Long Input Dialogues and Documents. http://arxiv.org/abs/2110.10150 arXiv:2110.10150 [cs].
- DialogLM: Pre-trained Model for Long Dialogue Understanding and Summarization. http://arxiv.org/abs/2109.02492
- Logan Golia (1 paper)
- Jugal Kalita (64 papers)