Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Relative Positional Encoding for Transformers with Linear Complexity (2105.08399v2)

Published 18 May 2021 in cs.LG, cs.CL, cs.SD, eess.AS, and stat.ML

Abstract: Recent advances in Transformer models allow for unprecedented sequence lengths, due to linear space and time complexity. In the meantime, relative positional encoding (RPE) was proposed as beneficial for classical Transformers and consists in exploiting lags instead of absolute positions for inference. Still, RPE is not available for the recent linear-variants of the Transformer, because it requires the explicit computation of the attention matrix, which is precisely what is avoided by such methods. In this paper, we bridge this gap and present Stochastic Positional Encoding as a way to generate PE that can be used as a replacement to the classical additive (sinusoidal) PE and provably behaves like RPE. The main theoretical contribution is to make a connection between positional encoding and cross-covariance structures of correlated Gaussian processes. We illustrate the performance of our approach on the Long-Range Arena benchmark and on music generation.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Antoine Liutkus (12 papers)
  2. Ondřej Cífka (9 papers)
  3. Shih-Lun Wu (16 papers)
  4. Umut Şimşekli (92 papers)
  5. Yi-Hsuan Yang (89 papers)
  6. Gaël Richard (46 papers)
Citations (38)