The Case for Translation-Invariant Self-Attention in Transformer-Based Language Models (2106.01950v1)

Published 3 Jun 2021 in cs.CL, cs.AI, and cs.LG

Abstract: Mechanisms for encoding positional information are central for transformer-based LLMs. In this paper, we analyze the position embeddings of existing LLMs, finding strong evidence of translation invariance, both for the embeddings themselves and for their effect on self-attention. The degree of translation invariance increases during training and correlates positively with model performance. Our findings lead us to propose translation-invariant self-attention (TISA), which accounts for the relative position between tokens in an interpretable fashion without needing conventional position embeddings. Our proposal has several theoretical advantages over existing position-representation approaches. Experiments show that it improves on regular ALBERT on GLUE tasks, while only adding orders of magnitude less positional parameters.

Authors (2)

Ulme Wennberg (4 papers)
Gustav Eje Henter (51 papers)

Citations (21)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

The Case for Translation-Invariant Self-Attention in Transformer-Based Language Models (2106.01950v1)

Summary

Related Papers