Contrastive Attention Mechanism for Abstractive Sentence Summarization (1910.13114v2)

Published 29 Oct 2019 in cs.CL and cs.AI

Abstract: We propose a contrastive attention mechanism to extend the sequence-to-sequence framework for abstractive sentence summarization task, which aims to generate a brief summary of a given source sentence. The proposed contrastive attention mechanism accommodates two categories of attention: one is the conventional attention that attends to relevant parts of the source sentence, the other is the opponent attention that attends to irrelevant or less relevant parts of the source sentence. Both attentions are trained in an opposite way so that the contribution from the conventional attention is encouraged and the contribution from the opponent attention is discouraged through a novel softmax and softmin functionality. Experiments on benchmark datasets show that, the proposed contrastive attention mechanism is more focused on the relevant parts for the summary than the conventional attention mechanism, and greatly advances the state-of-the-art performance on the abstractive sentence summarization task. We release the code at https://github.com/travel-go/Abstractive-Text-Summarization

Authors (6)

Xiangyu Duan (10 papers)
Hoongfei Yu (1 paper)
Mingming Yin (1 paper)
Min Zhang (630 papers)
Weihua Luo (63 papers)
Yue Zhang (620 papers)

Citations (47)

View on Semantic Scholar

Summary

The paper introduces a contrastive attention mechanism for abstractive sentence summarization that uses both conventional and opponent attention to distinguish relevant from irrelevant text.
The proposed model achieved state-of-the-art results on benchmark datasets like the annotated Gigaword Corpus and DUC-2004, demonstrating significant improvements in ROUGE scores over baseline models.
The contrastive attention mechanism shows promise for adaptation to other sequence-to-sequence tasks like machine translation and integration into various neural architectures beyond Transformers.

Contrastive Attention Mechanism for Abstractive Sentence Summarization

The paper "Contrastive Attention Mechanism for Abstractive Sentence Summarization" by Duan et al. introduces an innovative approach to enhancing the sequence-to-sequence learning framework, specifically for the abstractive sentence summarization task. The paper introduces a contrastive attention mechanism comprising two forms of attention: conventional attention, which attends to relevant portions of the sentence, and an opponent attention, which focuses on irrelevant or less relevant segments. The crux of this method lies in contrasting these two attention types, using a mix of softmax and softmin functions, to maximize the influence of relevant features while minimizing the contribution of irrelevant ones.

Theoretical and Practical Contributions

The paper effectively extends the architecture of traditional sequence models by utilizing a contrastive attention mechanism. The underlying model operates within a Transformer-based framework, which has been shown in recent literature to outperform RNN and CNN variants in text summarization tasks. The proposed model applies conventional attention to inform decision-making regarding the generation of target words from the source sequence. The opponent attention, realized through a unique softmin variant, counters this by focusing on less pertinent parts of the sequence.

The major achievement of this attention mechanism is its ability to produce state-of-the-art results on benchmark datasets like the annotated Gigaword Corpus and the DUC-2004, part of which involves headlines paired with article sentences. The results show significant improvements over baseline Transformer models. On Gigaword and DUC-2004, the model incorporating contrastive attention achieved notable increases in ROUGE scores, solidifying the utility of the approach for practical text summarization applications.

Experimental Evaluation

The paper rigorously evaluates the mechanism on both English and Chinese datasets. For the English language, the annotated Gigaword Corpus is utilized, achieving outstanding performance, reflected in improved ROUGE-1, ROUGE-2, and ROUGE-L scores. Similarly, on the Chinese LCSTS dataset, this method also outperforms existing methodologies. The methodologies include various neural network enhancements like the deep recurrent generative decoder and actor-critic frameworks. The consistent enhancement across different datasets underscores the robustness and versatility of the contrastive mechanism.

Implications and Future Developments

The implications of this research are manifold. The contrastive attention mechanism could potentially be adapted to other sequence-to-sequence tasks beyond sentence summarization, such as machine translation, where distinguishing relevant from irrelevant contextual elements is equally crucial. The successful implementation on Transformers suggests that similar conceptions could further refine models in other AI tasks where attention models are applicable.

The paper indicates that future developments may explore optimizing contrastive attention mechanisms further by dynamically determining the relevant portions of text, removing the necessity for manual head selection. Additionally, there is a prospect of examining how such mechanisms could be integrated into other neural architectures to leverage attention in a broader spectrum of NLP tasks.

Overall, the paper significantly contributes to the field of NLP by advancing not only the theory of attention mechanisms but also providing empirical evidence for the efficacy of contrastive attention in real-world text summarization tasks.

Related Papers

YouTube

Show All Videos