Discourse-Aware Neural Extractive Text Summarization (1910.14142v2)

Published 30 Oct 2019 in cs.CL

Abstract: Recently BERT has been adopted for document encoding in state-of-the-art text summarization models. However, sentence-based extractive models often result in redundant or uninformative phrases in the extracted summaries. Also, long-range dependencies throughout a document are not well captured by BERT, which is pre-trained on sentence pairs instead of documents. To address these issues, we present a discourse-aware neural summarization model - DiscoBert. DiscoBert extracts sub-sentential discourse units (instead of sentences) as candidates for extractive selection on a finer granularity. To capture the long-range dependencies among discourse units, structural discourse graphs are constructed based on RST trees and coreference mentions, encoded with Graph Convolutional Networks. Experiments show that the proposed model outperforms state-of-the-art methods by a significant margin on popular summarization benchmarks compared to other BERT-base models.

PDF Abstract

Discourse-Aware Neural Extractive Model for Text Summarization: An Expert Overview

The paper "Discourse-Aware Neural Extractive Model for Text Summarization" presents a nuanced approach to extractive text summarization by proposing DiscoBert, a novel model that aims to address some intrinsic limitations observed in traditional BERT-based models. The authors, Jiacheng Xu, Zhe Gan, Yu Cheng, and Jingjing Liu, focus on optimizing the balance between factual accuracy and summary conciseness using a fine-grained discourse-aware methodology.

Summary and Methodology

DiscoBert identifies and addresses two major drawbacks of existing BERT-based extractive summarization models: redundancy and limited capability in capturing long-range sentence dependencies. These models typically treat sentences as the atomic unit of selection, which often results in summaries containing unnecessary repetitive information. Furthermore, traditional BERT models struggle to effectively capture long-range dependencies across sentences due to their pre-training on sentence pairs rather than entire documents.

The authors propose a graph-based strategy where discourse segments, referred to as Elementary Discourse Units (EDUs), are leveraged instead of sentences. This choice allows for a more granular and contextually aware extraction of information, thereby reducing redundancy by operating on a sub-sentential level. The introduction of two specific discourse graphs - the Rhetorical Structure Theory (RST) Graph and the Coreference Graph - enables the model to capture intricate dependencies between these discourse units.

The RST Graph is derived from RST discourse trees that map out the relationships between EDUs within a text, emphasizing the nuclearity and satellite structure indicative of centrality and peripheral content. The Coreference Graph, on the other hand, connects EDUs that mention coreferential entities, thus allowing the model to propagate context across the document beyond the confines of proximal segments. By deploying a Graph Convolutional Network (GCN), DiscoBert effectively models these long-range dependencies, enhancing the model’s ability to produce coherent summarizations.

Experimental Results

DiscoBert's performance was evaluated on two prominent datasets, CNN/Daily Mail (CNNDM) and New York Times (NYT). It outperformed state-of-the-art methods including standard BertSum and other hybrid models with notable improvements in ROUGE scores. Specifically, DiscoBert achieved a substantive margin over other models with a cohesive use of both presented discourse graphs. These gains highlight the practical advantage of incorporating discourse structures into summarization frameworks. This success is a testament to the model’s effectiveness in balancing comprehensive coverage with conciseness and factual integrity, a perennial challenge in extractive summarization.

Implications and Future Research

The introduction of DiscoBert sets a precedent for future research exploring the interplay between discourse structures and semantic representation in NLP tasks. The discourse-aware approach emphasizes the importance of internal document relationships and encourages experimentation with graph-based models in other domains of AI that deal with similar structural complexities.

The implications of this research extend to both theoretical and practical domains in NLP. Theoretically, understanding and employing discourse structures can refine computational models by introducing robust linguistic insights. Practically, applications such as automated summarization tools, particularly for lengthy and information-dense documents, can benefit significantly from these advances with improved efficiency and accuracy.

Looking forward, subsequent studies might explore alternative graph structures or extend this methodology to multilingual corpora, considering the diverse rhetorical structures across languages. Further, integrating these discourse structures with abstractive summarization could provide an intriguing avenue for research, potentially leveraging the strengths of both summarization paradigms.

In summary, the paper contributes a substantial advancement to the extractive summarization landscape by presenting DiscoBert. The model’s integration of discourse units and graph-based learning presents a forward-thinking approach, setting a new benchmark for summarization models seeking to effectively incorporate deep linguistic insights.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Jiacheng Xu (41 papers)
Zhe Gan (135 papers)
Yu Cheng (354 papers)
Jingjing Liu (139 papers)

Citations (263)

View on Semantic Scholar

Discourse-Aware Neural Extractive Text Summarization (1910.14142v2)

Discourse-Aware Neural Extractive Model for Text Summarization: An Expert Overview

Summary and Methodology

Experimental Results

Implications and Future Research

Related Papers