Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dissecting Transformer Length Extrapolation via the Lens of Receptive Field Analysis (2212.10356v2)

Published 20 Dec 2022 in cs.CL

Abstract: Length extrapolation permits training a transformer LLM on short sequences that preserves perplexities when tested on substantially longer sequences. A relative positional embedding design, ALiBi, has had the widest usage to date. We dissect ALiBi via the lens of receptive field analysis empowered by a novel cumulative normalized gradient tool. The concept of receptive field further allows us to modify the vanilla Sinusoidal positional embedding to create ~\textbf{Sandwich}, the first parameter-free relative positional embedding design that truly length information uses longer than the training sequence. Sandwich shares with KERPLE and T5 the same logarithmic decaying temporal bias pattern with learnable relative positional embeddings; these elucidate future extrapolatable positional embedding design.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Ta-Chung Chi (19 papers)
  2. Ting-Han Fan (15 papers)
  3. Peter J. Ramadge (25 papers)
  4. Alexander I. Rudnicky (9 papers)
Citations (34)

Summary

We haven't generated a summary for this paper yet.