Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents (1804.05685v2)

Published 16 Apr 2018 in cs.CL
A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents

Abstract: Neural abstractive summarization models have led to promising results in summarizing relatively short documents. We propose the first model for abstractive summarization of single, longer-form documents (e.g., research papers). Our approach consists of a new hierarchical encoder that models the discourse structure of a document, and an attentive discourse-aware decoder to generate the summary. Empirical results on two large-scale datasets of scientific papers show that our model significantly outperforms state-of-the-art models.

A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents

The paper, "A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents," presents a novel approach for the summarization of long documents, specifically focusing on scientific papers. The authors introduce an advanced neural model that incorporates discourse structure to effectively generate abstractive summaries of lengthy, structured texts. The model leverages a hierarchical encoder and a discourse-aware decoder to capture and utilize the inherent document structure.

Model Overview

The authors propose a hierarchical encoder-decoder architecture that extends traditional sequence-to-sequence (seq2seq) models. The encoder is designed to handle the document's discourse structure by encoding each section individually and then producing a comprehensive document representation. The decoder, enhanced by a discourse-aware attention mechanism, allows the model to focus on relevant sections during summary generation.

Key elements of the model include:

  • Hierarchical Encoder: Utilizes a bidirectional LSTM to encode words into section representations, which are later aggregated into a document-level encoding.
  • Discourse-Aware Decoder: Employs an attention mechanism that considers both word-level and section-level information, enhancing context vector formation during decoding.
  • Copy Mechanism: Integrates a pointer-generator network to directly copy words from source documents, facilitating the handling of Out-Of-Vocabulary (OOV) words.

Datasets and Experiments

The research introduces two large-scale datasets, derived from arXiv and PubMed, containing scientific papers annotated with discourse structures and summaries. These datasets significantly exceed the length of documents in existing large-scale summarization datasets such as CNN and Daily Mail, presenting challenges for traditional models.

The evaluation demonstrates that the proposed model outperforms several baseline methods, including extractive approaches like LexRank and abstractive models such as Pntr-Gen-Seq2Seq. Notably, the model achieves a Rouge-1 score improvement of approximately 4 points on the arXiv dataset compared to the Pntr-Gen-Seq2Seq baseline.

Implications and Future Directions

This discourse-aware approach has substantial implications for both the theoretical understanding and practical implementation of abstractive summarization. By effectively modeling long-form and structured documents, the model opens avenues for generating more coherent and comprehensive summaries, reminiscent of human summarization.

In future work, further exploration could involve refining the attention mechanisms to enhance the summarization quality and expanding the scope to other document types. Additionally, employing expert human evaluations would offer more nuanced insights into the coverage and coherence of generated summaries beyond what automated metrics like Rouge provide.

In conclusion, the methodology presented in this research advances the capabilities of abstractive summarization systems, making significant contributions to the treatment of long and complex documents. The datasets and model architecture set a foundation for subsequent studies in this domain.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Arman Cohan (121 papers)
  2. Franck Dernoncourt (161 papers)
  3. Doo Soon Kim (20 papers)
  4. Trung Bui (79 papers)
  5. Seokhwan Kim (29 papers)
  6. Walter Chang (21 papers)
  7. Nazli Goharian (43 papers)
Citations (693)