Papers
Topics
Authors
Recent
2000 character limit reached

Efficient Attentions for Long Document Summarization

Published 5 Apr 2021 in cs.CL | (2104.02112v2)

Abstract: The quadratic computational and memory complexities of large Transformers have limited their scalability for long document summarization. In this paper, we propose Hepos, a novel efficient encoder-decoder attention with head-wise positional strides to effectively pinpoint salient information from the source. We further conduct a systematic study of existing efficient self-attentions. Combined with Hepos, we are able to process ten times more tokens than existing models that use full attentions. For evaluation, we present a new dataset, GovReport, with significantly longer documents and summaries. Results show that our models produce significantly higher ROUGE scores than competitive comparisons, including new state-of-the-art results on PubMed. Human evaluation also shows that our models generate more informative summaries with fewer unfaithful errors.

Citations (231)

Summary

  • The paper introduces Hepos, a novel strided encoder-decoder attention mechanism that doubles processing length for long document summarization.
  • It leverages head-wise positional strides to reduce computational and memory complexity while maintaining a comprehensive global view.
  • The method outperforms existing efficient self-attention techniques with higher ROUGE scores and reduced content hallucination in human evaluations.

Efficient Attentions for Long Document Summarization

The paper "Efficient Attentions for Long Document Summarization" tackles the enduring issue of processing extensive text sequences within the Transformer framework, which is critical for generating abstractive summaries of long documents like scientific papers and government reports. The authors address the quadratic computational and memory complexities that hamper the scalability of Transformers for such tasks by introducing Hepos, a novel encoder-decoder attention mechanism utilizing head-wise positional strides.

Methodological Contributions

The primary contribution of this research is the development of Hepos, which efficiently manages attention resources by adopting a strided pattern across attention heads, each starting at different positions. This enables the model to effectively emphasize salient parts of the input while maintaining a global view within a significantly reduced computational budget. Notably, Hepos doubles the input sequence length that can be processed compared to full-attention models, which underscores the importance of balancing efficiency with attention capability.

To complement Hepos, the authors conduct a comparative study of existing efficient self-attention mechanisms, which include fixed pattern, low-rank approximation, and learnable pattern approaches. Key findings suggest that learnable patterns, particularly Sinkhorn attention, provide superior performance close to the full attention baseline. However, when combined with Hepos, the summarization models demonstrate enhanced scalability and performance.

Evaluation and Results

The assessment of the proposed method is multifaceted, involving both automatic and human evaluations. A new dataset, named GovReport, comprising lengthy U.S. government reports with expert-authored summaries, serves as a benchmark for testing. According to the results, models with Hepos achieved superior ROUGE scores on PubMed and new state-of-the-art scores on GovReport, reaffirming the capability of the method to handle long inputs effectively.

Moreover, the approach's improvements in summary informativeness were validated through structured human evaluations, where judges noted reductions in hallucinated content and better coverage of document content when longer input sequences were utilized. This was complemented by the introduction of a new faithfulness metric, APESsrc_{src}, which demonstrated better correlation with human judgments compared to existing metrics, further supporting the argument for increased summary fidelity when employing Hepos encoded-decoder attentions.

Implications and Future Work

Hepos presents a significant stride towards the practical application of Transformers in real-world summarization scenarios involving substantial text data. By efficiently managing computational overhead, it opens the door for summarization systems to operate within more demanding operational contexts without sacrificing performance.

Future developments may extend these findings by further exploring synergy between different attention mechanisms, improving upon area-specific datasets, and integrating domain-specific knowledge in encoder-decoder architectures. Moreover, the paper’s insights invite exploration into optimizing training procedures for long document understanding and comprehending the interactions between varied efficient attention mechanisms.

In conclusion, the paper advances the field of NLP and AI by addressing a fundamental limitation within Transformer models for long document processing. It contributes a practical and scalable approach that merits further investigation and potential adaptation across diverse domains.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.