`Keep it Together': Enforcing Cohesion in Extractive Summaries by Simulating Human Memory (2402.10643v1)
Abstract: Extractive summaries are usually presented as lists of sentences with no expected cohesion between them. In this paper, we aim to enforce cohesion whilst controlling for informativeness and redundancy in summaries, in cases where the input exhibits high redundancy. The pipeline controls for redundancy in long inputs as it is consumed, and balances informativeness and cohesion during sentence selection. Our sentence selector simulates human memory to keep track of topics --modeled as lexical chains--, enforcing cohesive ties between noun phrases. Across a variety of domains, our experiments revealed that it is possible to extract highly cohesive summaries that nevertheless read as informative to humans as summaries extracted by only accounting for informativeness or redundancy. The extracted summaries exhibit smooth topic transitions between sentences as signaled by lexical chains, with chains spanning adjacent or near-adjacent sentences.
- Regina Barzilay and Michael Elhadad. 1997. Using lexical chains for text summarization. In Intelligent Scalable Text Summarization.
- Regina Barzilay and Noemie Elhadad. 2002. Inferring strategies for sentence ordering in multidocument news summarization. Journal of Artificial Intelligence Research, 17:35–55.
- Regina Barzilay and Mirella Lapata. 2008. Modeling local coherence: An entity-based approach. Computational Linguistics, 34(1):1–34.
- Longformer: The long-document transformer. arXiv:2004.05150.
- Rishi Bommasani and Claire Cardie. 2020. Intrinsic evaluation of summarization datasets. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 8075–8096.
- Jaime Carbonell and Jade Goldstein. 1998. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 335–336.
- Toward unifying text segmentation and long document summarization. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 106–118.
- A discourse-aware attention model for abstractive summarization of long documents. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 615–621.
- Günes Erkan and Dragomir R Radev. 2004. Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research, 22:457–479.
- Summeval: Re-evaluating summarization evaluation. Transactions of the Association for Computational Linguistics, 9:391–409.
- Multi-news: A large-scale multi-document summarization dataset and abstractive hierarchical model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1074–1084.
- Yimai Fang. 2019. Proposition-based summarization with a coherence-driven incremental model. Ph.D. thesis, University of Cambridge.
- Murray Glanzer. 1972. Storage mechanisms in recall. In Psychology of learning and motivation, volume 5, pages 129–193. Elsevier.
- MemSum: Extractive summarization of long documents using multi-step episodic Markov decision processes. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6507–6522, Dublin, Ireland. Association for Computational Linguistics.
- Camille Guinaudeau and Michael Strube. 2013. Graph-based local coherence modeling. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 93–103.
- Longt5: Efficient text-to-text transformer for long sequences. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 724–736.
- Cohesion in English. Routledge.
- Deberta: Decoding-enhanced bert with disentangled attention. In International Conference on Learning Representations.
- Teaching machines to read and comprehend. Advances in neural information processing systems, 28.
- Efficient attentions for long document summarization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1419–1436.
- Sungho Jeon and Michael Strube. 2020. Centering-based neural coherence modeling with hierarchical discourse segments. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7458–7472, Online. Association for Computational Linguistics.
- Barbara Johnstone. 1994. Repetition in discourse : interdisciplinary perspectives. Advances in discourse processes. Ablex Publishing Corporation.
- Eileen Kintsch. 1990. Macroprocesses and microprocesses in the development of summarization skill. Cognition and instruction, 7(3):161–195.
- Walter Kintsch and Teun A van Dijk. 1978. Toward a model of text comprehension and production. Psychological review, 85(5):363.
- Klaus Krippendorff. 2011. Computing krippendorff’s alpha-reliability.
- Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81.
- Generating wikipedia by summarizing long sequences. In International Conference on Learning Representations.
- Yang Liu and Mirella Lapata. 2019. Hierarchical transformers for multi-document summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, page 5070. Association for Computational Linguistics.
- Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692.
- Ilya Loshchilov and Frank Hutter. 2018. Decoupled weight decay regularization. In International Conference on Learning Representations.
- Felix: Flexible text editing through tagging and insertion. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1244–1255.
- Mohsen Mesgar and Michael Strube. 2016. Lexical coherence graph modeling using word embeddings. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1414–1423, San Diego, California. Association for Computational Linguistics.
- Jane Morris and Graeme Hirst. 1991. Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics, 17(1):21–48.
- Abstractive text summarization using sequence-to-sequence rnns and beyond. In Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, page 280. Association for Computational Linguistics.
- Automatic summarization. Foundations and Trends® in Information Retrieval, 5(2–3):103–233.
- Learning to score system summaries for better content selection evaluation. In Proceedings of the Workshop on New Frontiers in Summarization, pages 74–84, Copenhagen, Denmark. Association for Computational Linguistics.
- Bigpatent: A large-scale dataset for abstractive and coherent summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2204–2213.
- Julius Steen and Katja Markert. 2022. How to find strong summary coherence measures? a toolbox and a comparative study for summary coherence measure evaluation. In Proceedings of the 29th International Conference on Computational Linguistics, pages 6035–6049, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
- Ana María Vigara Tauste. 1995. Comodidad y recurrencia en la organización del discurso coloquial. In El español coloquial: actas del I Simposio sobre análisis del discurso oral: Almería, 23-25 de noviembre de 1994, pages 173–208. Servicio de Publicaciones.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- Attention is all you need. Advances in neural information processing systems, 30.
- Marilyn A Walker. 1993. Informational redundancy and resource bounds in dialogue. Ph.D. thesis, Graduate School of Arts and Sciences, University of Pennsylvania.
- Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768.
- Bottlesum: Unsupervised and self-supervised sentence summarization using the information bottleneck principle. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3752–3761.
- Wen Xiao and Giuseppe Carenini. 2020. Systematically exploring redundancy reduction in summarizing long documents. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, pages 516–528.
- Big bird: Transformers for longer sequences. Advances in neural information processing systems, 33:17283–17297.
- Multi-candidate reduction: Sentence compression as a tool for document summarization tasks. Information Processing & Management, 43(6):1549–1570.
- Bertscore: Evaluating text generation with bert. In International Conference on Learning Representations.
- Discoscore: Evaluating text generation with bert and discourse coherence. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 3847–3865.