Bottom-Up Abstractive Summarization (1808.10792v2)

Published 31 Aug 2018 in cs.CL, cs.AI, and cs.LG

Abstract: Neural network-based methods for abstractive summarization produce outputs that are more fluent than other techniques, but which can be poor at content selection. This work proposes a simple technique for addressing this issue: use a data-efficient content selector to over-determine phrases in a source document that should be part of the summary. We use this selector as a bottom-up attention step to constrain the model to likely phrases. We show that this approach improves the ability to compress text, while still generating fluent summaries. This two-step process is both simpler and higher performing than other end-to-end content selection models, leading to significant improvements on ROUGE for both the CNN-DM and NYT corpus. Furthermore, the content selector can be trained with as little as 1,000 sentences, making it easy to transfer a trained summarizer to a new domain.

PDF Abstract

Bottom-Up Abstractive Summarization: A Detailed Overview

This paper introduces a novel approach to neural abstractive summarization that seeks to address the common challenge of content selection while maintaining the fluency benefits offered by neural networks. By proposing a bottom-up method that first performs a data-driven content selection, the authors effectively incorporate explicit attention to likely phrases that should be included in the summaries.

Methodology

The authors' approach is characterized by a two-step process. Initially, a content selector, framed as a sequence-tagging problem, is used to determine key phrases from a source document. This is accomplished using contextual word embeddings, specifically with the integration of both static (GloVe) and dynamic (ELMo) embeddings to enhance the tagging accuracy. The content selector exhibits strong performance metrics, with a recall of over 60% and precision surpassing 50%.

In the subsequent phase, these selected phrases form a "bottom-up" attention mechanism that constrains the capacity of the neural summarization model to replicate or generate words. This method demonstrates improved text compression capabilities and maintains summarization fluency. Crucially, it also offers a more data-efficient solution, requiring significantly fewer training samples compared to existing models.

Experimental Results

The proposed model is evaluated on standard datasets, including the CNN-Daily Mail (CNN-DM) and the New York Times (NYT) corpora. The empirical results highlight a notable increase in ROUGE scores, particularly an improvement from 36.4 to 38.3 in ROUGE-L on the CNN-DM dataset, confirming the model's superiority over baseline systems that do not employ a separate content selection step.

Furthermore, the model's simplicity and efficiency are underscored by its ability to be trained with as few as 1,000 sentences, which facilitates transfer to new domains. When applied to cross-domain tasks, the summarizer achieves a substantial performance increase of over 5 points in ROUGE-L using a content selector trained on a mere 1,000 in-domain sentences.

Comparative Analysis

While the approach bears similarities to multi-pass extractive-abstractive models seen in prior work, it differentiates itself by maintaining a fully abstractive methodology enhanced through a bottom-up content selection. The authors contrast several alternative models, including those using reinforcement learning (RL) and multi-tasking, noting that these either failed to match the simplicity or the performance enhancements provided by the proposed approach.

Implications and Future Directions

Practically, this method supports the development of more robust and adaptable summarization systems, particularly beneficial for domains where training data is limited. The theoretical implications contribute to ongoing discussions around the integration of content selection in neural network models, with potential applications extending beyond summarization to diverse areas such as grammar correction and data-to-text generation.

Looking ahead, the exploration of further integrating this bottom-up attention strategy into other complex AI tasks is a promising direction. This could enhance models that require both precision in content selection and adaptability across varied domains.

In summary, this paper presents a significant advancement in the field of abstractive summarization. By utilizing a bottom-up approach, the authors improve content selection efficiently, offering both practical and theoretical contributions to the ongoing development of LLMs.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Sebastian Gehrmann (48 papers)
Yuntian Deng (44 papers)
Alexander M. Rush (115 papers)

Citations (674)

View on Semantic Scholar