Hierarchical Attention Graph for Scientific Document Summarization in Global and Local Level (2405.10202v1)
Abstract: Scientific document summarization has been a challenging task due to the long structure of the input text. The long input hinders the simultaneous effective modeling of both global high-order relations between sentences and local intra-sentence relations which is the most critical step in extractive summarization. However, existing methods mostly focus on one type of relation, neglecting the simultaneous effective modeling of both relations, which can lead to insufficient learning of semantic representations. In this paper, we propose HAESum, a novel approach utilizing graph neural networks to locally and globally model documents based on their hierarchical discourse structure. First, intra-sentence relations are learned using a local heterogeneous graph. Subsequently, a novel hypergraph self-attention layer is introduced to further enhance the characterization of high-order inter-sentence relations. We validate our approach on two benchmark datasets, and the experimental results demonstrate the effectiveness of HAESum and the importance of considering hierarchical structures in modeling long scientific documents. Our code will be available at \url{https://github.com/MoLICHENXI/HAESum}
- A discourse-aware attention model for abstractive summarization of long documents. arXiv preprint arXiv:1804.05685.
- Sliding selector network with dynamic memory for extractive summarization of long documents. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5881–5891.
- Enhancing extractive text summarization with topic-aware graph neural networks. arXiv preprint arXiv:2010.06253.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Be more with less: Hypergraph attention networks for inductive text classification. arXiv preprint arXiv:2011.00387.
- Multi graph neural network for extractive long document summarization. In Proceedings of the 29th International Conference on Computational Linguistics, pages 5870–5875.
- Discourse-aware unsupervised summarization of long scientific documents. arXiv preprint arXiv:2005.00513.
- Heterogeneous hypergraph variational autoencoder for link prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(8):4125–4138.
- Lea Frermann and Alexandre Klementiev. 2019. Inducing document structure for aspect-based summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 6263–6273.
- Alexios Gidiotis and Grigorios Tsoumakas. 2020. A divide-and-conquer approach to the summarization of long documents. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28:3029–3040.
- Memsum: Extractive summarization of long documents using multi-step episodic markov decision processes. arXiv preprint arXiv:2107.08929.
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, 9(8):1735–1780.
- Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324.
- Improving unsupervised extractive summarization with facet-aware modeling. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 1685–1697.
- Chin-Yew Lin and Eduard Hovy. 2003. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the 2003 human language technology conference of the North American chapter of the association for computational linguistics, pages 150–157.
- Lost in the middle: How language models use long contexts. arXiv preprint arXiv:2307.03172.
- Yang Liu and Mirella Lapata. 2019. Text summarization with pretrained encoders. arXiv preprint arXiv:1908.08345.
- Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101.
- Reading like her: Human reading inspired extractive summarization. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3033–3043.
- Summarunner: A recurrent neural network based sequence model for extractive summarization of documents. In Proceedings of the AAAI conference on artificial intelligence, volume 31.
- Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543.
- Hetergraphlongsum: heterogeneous graph neural network with passage aggregation for extractive long document summarization. In Proceedings of the 29th International Conference on Computational Linguistics, pages 6248–6258.
- On extractive and abstractive neural document summarization with transformer language models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 9308–9319.
- On position bias in summarization with large language models. arXiv preprint arXiv:2310.10570.
- Histruct+: Improving extractive text summarization with hierarchical structure information. arXiv preprint arXiv:2203.09629.
- Frederick Suppe. 1998. The structure of a scientific paper. Philosophy of Science, 65(3):381–405.
- Attention is all you need. Advances in neural information processing systems, 30.
- Graph attention networks. arXiv preprint arXiv:1710.10903.
- Heterogeneous graph neural networks for extractive document summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6209–6219.
- Less is more for long document summary evaluation by llms. arXiv preprint arXiv:2309.07382.
- Wen Xiao and Giuseppe Carenini. 2019. Extractive summarization of long documents by combining global and local context. arXiv preprint arXiv:1909.08089.
- Do transformers really perform badly for graph representation? Advances in Neural Information Processing Systems, 34:28877–28888.
- Big bird: Transformers for longer sequences. Advances in neural information processing systems, 33:17283–17297.
- Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414.
- Hegel: Hypergraph transformer for long document summarization. arXiv preprint arXiv:2210.04126.
- Contrastive hierarchical discourse graph for scientific document summarization. arXiv preprint arXiv:2306.00177.
- Extractive summarization via chatgpt for faithful summary generation. arXiv preprint arXiv:2304.04193.
- Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In International Conference on Machine Learning, pages 11328–11339. PMLR.
- Hao Zheng and Mirella Lapata. 2019. Sentence centrality revisited for unsupervised summarization. arXiv preprint arXiv:1906.03508.
- Extractive summarization as text matching. arXiv preprint arXiv:2004.08795.
- Chenlong Zhao (2 papers)
- Xiwen Zhou (2 papers)
- Xiaopeng Xie (8 papers)
- Yong Zhang (660 papers)