Long Document Summarization with Top-down and Bottom-up Inference (2203.07586v1)

Published 15 Mar 2022 in cs.CL

Abstract: Text summarization aims to condense long documents and retain key information. Critical to the success of a summarization model is the faithful inference of latent representations of words or tokens in the source documents. Most recent models infer the latent representations with a transformer encoder, which is purely bottom-up. Also, self-attention-based inference models face the challenge of quadratic complexity with respect to sequence length. We propose a principled inference framework to improve summarization models on these two aspects. Our framework assumes a hierarchical latent structure of a document where the top-level captures the long range dependency at a coarser time scale and the bottom token level preserves the details. Critically, this hierarchical structure enables token representations to be updated in both a bottom-up and top-down manner. In the bottom-up pass, token representations are inferred with local self-attention to leverage its efficiency. Top-down correction is then applied to allow tokens to capture long-range dependency. We demonstrate the effectiveness of the proposed framework on a diverse set of summarization datasets, including narrative, conversational, scientific documents and news. Our model achieves (1) competitive or better performance on short documents with higher memory and compute efficiency, compared to full attention transformers, and (2) state-of-the-art performance on a wide range of long document summarization benchmarks, compared to recent efficient transformers. We also show that our model can summarize an entire book and achieve competitive performance using $0.27\%$ parameters (464M vs. 175B) and much less training data, compared to a recent GPT-3-based model. These results indicate the general applicability and benefits of the proposed framework.

PDF Abstract

Enhancing Long Document Summarization through Top-Down and Bottom-Up Inference

Introduction

Long document summarization plays a pivotal role in compressing extensive content while retaining critical information. Traditional models predominantly employ a bottom-up inference mechanism, potentially missing out on the hierarchical structure intrinsic to long documents. Additionally, the challenge of quadratic complexity in relation to sequence length further complicates this task. The paper addressed these issues by introducing a novel framework that capitalizes on both bottom-up and top-down inference, optimizing for memory and computational efficiency while improving or maintaining performance across a spectrum of document lengths and genres.

Methodologies

The proposed model, denoting as "Top-Down Transformer," integrates hierarchical latent structures for document understanding, combining bottom-up and top-down inference. Bottom-up inference, realized through local self-attention, efficiently computes token representations within a local context to mitigate quadratic complexity issues. The top-down mechanism then updates these token representations by incorporating global contextual information via high-level abstraction (e.g., segments or sentences), achieved through a method of pooling and subsequent cross-attention between token and segment representations.

Two key aspects underpinning the framework were:

Hierarchical Latent Structure: The model assumes a document's latent structure is hierarchical, capturing long-range dependencies coarsely at higher levels while preserving detailed information at the token level.
Cross-Attention Mechanism: For top-down correction, a cross-attention mechanism between the token-level and the segment-level representations allows for the infusion of global context into local token representations.

Experimental Setup and Results

The efficacy of the Top-Down Transformer was evaluated across diverse datasets, covering scientific texts, news articles, conversational transcripts, and narratives. The model was benchmarked against several state-of-the-art models and efficient transformers, demonstrating remarkable performances. Notably, it achieved competitive or superior results in summarizing long documents and showcased the ability to efficiently handle short documents as well.

For scientific articles represented in the PubMed and arXiv datasets, the model outperformed various efficient transformers, illustrating the advantage of synthesizing bottom-up and top-down inference over purely bottom-up approaches. Similarly, on narrative and conversational datasets like SummScreen and BookSum, the model's capacity to integrate information across a document's span was evident, with it outperforming strong baselines significantly.

Implications and Future Directions

The introduction of the Top-Down Transformer model presents significant theoretical and practical implications for the field of document summarization and beyond. Theoretically, it underscores the importance of considering a document's hierarchical structure and the synergy between bottom-up detail and top-down context in understanding and summarization tasks. Practically, the model's efficiency and scalability across document lengths and types suggest its applicability in real-world scenarios where processing long documents efficiently is paramount.

Looking forward, the model opens avenues for exploring multi-scale latent structures beyond two levels and investigating alternative mechanisms for top-down correction. Moreover, its adaptability to pre-trained LLMs invites further research into the pre-training objectives that could complement the bottom-up and top-down inference approach for enhanced summarization performance.

Conclusion

The Top-Down Transformer marks a significant stride in long document summarization, addressing the dual challenges of capturing hierarchical document structure and managing computational complexity. Through its innovative integration of bottom-up and top-down inference mechanisms, it sets a new benchmark for summarizing documents across diverse lengths and genres, paving the way for future explorations in efficient and effective document processing.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Bo Pang (77 papers)
Erik Nijkamp (22 papers)
Silvio Savarese (200 papers)
Yingbo Zhou (81 papers)
Caiming Xiong (337 papers)
Wojciech Kryściński (19 papers)

Citations (52)

View on Semantic Scholar