Emergent Mind

LOCOST: State-Space Models for Long Document Abstractive Summarization

(2401.17919)
Published Jan 31, 2024 in cs.CL and cs.LG

Abstract

State-space models are a low-complexity alternative to transformers for encoding long sequences and capturing long-term dependencies. We propose LOCOST: an encoder-decoder architecture based on state-space models for conditional text generation with long context inputs. With a computational complexity of $O(L \log L)$, this architecture can handle significantly longer sequences than state-of-the-art models that are based on sparse attention patterns. We evaluate our model on a series of long document abstractive summarization tasks. The model reaches a performance level that is 93-96% comparable to the top-performing sparse transformers of the same size while saving up to 50% memory during training and up to 87% during inference. Additionally, LOCOST effectively handles input texts exceeding 600K tokens at inference time, setting new state-of-the-art results on full-book summarization and opening new perspectives for long input processing.

Overview

  • The LOCOST framework introduces a novel approach in NLP for long document summarization by utilizing State-Space Models (SSMs) to enhance computational and memory efficiency.

  • It employs a unique architecture combining bidirectional Deep SSMs for encoding and traditional transformer decoders for summarization, achieving significant memory savings.

  • Through extensive evaluations, LOCOST demonstrated close performance to leading sparse transformers while notably reducing memory consumption and handling inputs over 600K tokens.

  • The framework's success in handling long texts without truncation suggests potential for developing advanced NLP tools, setting new benchmarks in long document processing.

Introduction

The realm of NLP has continuously evolved, with models dedicated to efficiently handle long texts gaining increasing attention. A notable stride in this evolution is the introduction of the LOCOST framework, standing for Long Context State-Space Transformer. This architecture embarks on utilizing Deep State-Space Models (SSMs) as a cornerstone for encoding long documents, aiming to address the computational and memory efficiency challenges that traditional transformer models face when summarizing extensive texts.

State-Space Models in LOCOST

State-Space Models (SSMs) have been highlighted for their lower complexity in comparison to transformers, showcasing exceptional capability in capturing long-term dependencies within sequences. The LOCOST architecture harnesses these models to construct an encoder-decoder framework tailored for conditional text generation, particularly focusing on the task of long document abstractive summarization. By employing SSMs, LOCOST enjoys a computational complexity of O(LlogL), significantly enhancing its ability to handle sequences of considerable lengths, far beyond the capabilities of sparse transformers.

Architectural Innovations and Memory Efficiency

LOCOST introduces an innovative architectural design that combines bidirectional Deep SSMs for encoding with traditional transformer decoders for generating summaries. This design strategically diminishes memory requirements, enabling up to 50% memory savings during training and up to 87% during inference. Such efficiency positions LOCOST as a highly competitive alternative to existing models, not only in terms of performance but also in computational resource utilization.

Comprehensive Evaluation and Results

The efficacy of LOCOST was thoroughly evaluated across various datasets focusing on long document summarization tasks. The framework demonstrated the ability to achieve up to 96% of the performance metrics of leading sparse transformers, with a notable reduction in memory consumption. Furthermore, LOCOST's capacity to process inputs exceeding 600K tokens marks a significant advancement, setting new benchmarks in the field for handling extremely long inputs like full books.

Future Directions and Implications

The introduction of LOCOST opens new avenues for research and application in the domain of long document processing. Its ability to efficiently summarize entire books without the need for truncation offers promising potential for developing more sophisticated NLP tools. Future studies might explore the scalability of this architecture, experimenting with larger model sizes and further optimization to enhance performance and versatility.

Conclusion

LOCOST represents a substantial progress in the field of NLP, particularly in the summarization of long documents. By leveraging the unique advantages of State-Space Models, this framework not only showcases superior memory efficiency but also sets new standards in the processing capabilities for lengthy sequences. The success of LOCOST paves the way for further exploration and refinement in the development of models tailored for extensive textual data, highlighting the evolving landscape of NLP technology.

Get summaries of trending AI papers delivered straight to your inbox

Unsubscribe anytime.

Test Your Knowledge

You answered out of questions correctly.

Well done!