Do Long-Range Language Models Actually Use Long-Range Context? (2109.09115v1)

Published 19 Sep 2021 in cs.CL

Abstract: LLMs are generally trained on short, truncated input sequences, which limits their ability to use discourse-level information present in long-range context to improve their predictions. Recent efforts to improve the efficiency of self-attention have led to a proliferation of long-range Transformer LLMs, which can process much longer sequences than models of the past. However, the ways in which such models take advantage of the long-range context remain unclear. In this paper, we perform a fine-grained analysis of two long-range Transformer LLMs (including the \emph{Routing Transformer}, which achieves state-of-the-art perplexity on the PG-19 long-sequence LM benchmark dataset) that accept input sequences of up to 8K tokens. Our results reveal that providing long-range context (i.e., beyond the previous 2K tokens) to these models only improves their predictions on a small set of tokens (e.g., those that can be copied from the distant context) and does not help at all for sentence-level prediction tasks. Finally, we discover that PG-19 contains a variety of different document types and domains, and that long-range context helps most for literary novels (as opposed to textbooks or magazines).

PDF Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

Authors (4)

Simeng Sun (23 papers)
Kalpesh Krishna (30 papers)
Andrew Mattarella-Micke (2 papers)
Mohit Iyyer (87 papers)

Citations (76)

View on Semantic Scholar

Do Long-Range Language Models Actually Use Long-Range Context? (2109.09115v1)

Related Papers