LOCOST: State-Space Models for Long Document Abstractive Summarization (2401.17919v3)

Published 31 Jan 2024 in cs.CL and cs.LG

Abstract: State-space models are a low-complexity alternative to transformers for encoding long sequences and capturing long-term dependencies. We propose LOCOST: an encoder-decoder architecture based on state-space models for conditional text generation with long context inputs. With a computational complexity of $O(L \log L)$, this architecture can handle significantly longer sequences than state-of-the-art models that are based on sparse attention patterns. We evaluate our model on a series of long document abstractive summarization tasks. The model reaches a performance level that is 93-96% comparable to the top-performing sparse transformers of the same size while saving up to 50% memory during training and up to 87% during inference. Additionally, LOCOST effectively handles input texts exceeding 600K tokens at inference time, setting new state-of-the-art results on full-book summarization and opening new perspectives for long input processing.

Citations (4)

View on Semantic Scholar

Summary

The paper introduces LOCOST, a novel encoder-decoder architecture that employs deep state-space models to efficiently summarize long documents.
It achieves reduced computational complexity of O(LlogL) and saves up to 87% memory during inference compared to sparse transformers.
Empirical results show LOCOST processing inputs over 600K tokens and reaching 96% of the performance metrics of leading transformer models.

LOCOST: Leveraging State-Space Models for Advanced Long Document Summarization

Introduction

The field of NLP has continuously evolved, with models dedicated to efficiently handle long texts gaining increasing attention. A notable stride in this evolution is the introduction of the LOCOST framework, standing for Long Context State-Space Transformer. This architecture embarks on utilizing Deep State-Space Models (SSMs) as a cornerstone for encoding long documents, aiming to address the computational and memory efficiency challenges that traditional transformer models face when summarizing extensive texts.

State-Space Models in LOCOST

State-Space Models (SSMs) have been highlighted for their lower complexity in comparison to transformers, showcasing exceptional capability in capturing long-term dependencies within sequences. The LOCOST architecture harnesses these models to construct an encoder-decoder framework tailored for conditional text generation, particularly focusing on the task of long document abstractive summarization. By employing SSMs, LOCOST enjoys a computational complexity of O(LlogL), significantly enhancing its ability to handle sequences of considerable lengths, far beyond the capabilities of sparse transformers.

Architectural Innovations and Memory Efficiency

LOCOST introduces an innovative architectural design that combines bidirectional Deep SSMs for encoding with traditional transformer decoders for generating summaries. This design strategically diminishes memory requirements, enabling up to 50% memory savings during training and up to 87% during inference. Such efficiency positions LOCOST as a highly competitive alternative to existing models, not only in terms of performance but also in computational resource utilization.

Comprehensive Evaluation and Results

The efficacy of LOCOST was thoroughly evaluated across various datasets focusing on long document summarization tasks. The framework demonstrated the ability to achieve up to 96% of the performance metrics of leading sparse transformers, with a notable reduction in memory consumption. Furthermore, LOCOST's capacity to process inputs exceeding 600K tokens marks a significant advancement, setting new benchmarks in the field for handling extremely long inputs like full books.

Future Directions and Implications

The introduction of LOCOST opens new avenues for research and application in the domain of long document processing. Its ability to efficiently summarize entire books without the need for truncation offers promising potential for developing more sophisticated NLP tools. Future studies might explore the scalability of this architecture, experimenting with larger model sizes and further optimization to enhance performance and versatility.

Conclusion

LOCOST represents a substantial progress in the field of NLP, particularly in the summarization of long documents. By leveraging the unique advantages of State-Space Models, this framework not only showcases superior memory efficiency but also sets new standards in the processing capabilities for lengthy sequences. The success of LOCOST paves the way for further exploration and refinement in the development of models tailored for extensive textual data, highlighting the evolving landscape of NLP technology.

PDF Markdown

Related Papers

Tweets

https://twitter.com/CSProfKGD/status/1782399284971028588

https://twitter.com/LaureSoulier/status/1770537023524282397

https://twitter.com/mlia_isir/status/1770537871901909263

https://twitter.com/CriteoAILab/status/1772227640289902769

https://twitter.com/MatRavox/status/1753975328412942775

https://twitter.com/LAMSADEDauphine/status/1755521144495526258