TRAMS: Training-free Memory Selection for Long-range Language Modeling (2310.15494v3)

Published 24 Oct 2023 in cs.CL

Abstract: The Transformer architecture is crucial for numerous AI models, but it still faces challenges in long-range LLMing. Though several specific transformer architectures have been designed to tackle issues of long-range dependencies, existing methods like Transformer-XL are plagued by a high percentage of ineffective memories. In this study, we present a plug-and-play strategy, known as TRAining-free Memory Selection (TRAMS), that selects tokens participating in attention calculation based on one simple metric. This strategy allows us to keep tokens that are likely to have a high attention score with the current queries and ignore the other ones. We have tested our approach on the word-level benchmark (WikiText-103) and the character-level benchmark (enwik8), and the results indicate an improvement without having additional training or adding additional parameters.

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

References (30)

Authors (4)

Haofei Yu (17 papers)
Cunxiang Wang (30 papers)
Yue Zhang (618 papers)
Wei Bi (62 papers)

Citations (3)

View on Semantic Scholar

TRAMS: Training-free Memory Selection for Long-range Language Modeling (2310.15494v3)

Related Papers