Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation (2205.09921v2)

Published 20 May 2022 in cs.CL and cs.LG

Abstract: Relative positional embeddings (RPE) have received considerable attention since RPEs effectively model the relative distance among tokens and enable length extrapolation. We propose KERPLE, a framework that generalizes relative position embedding for extrapolation by kernelizing positional differences. We achieve this goal using conditionally positive definite (CPD) kernels, a class of functions known for generalizing distance metrics. To maintain the inner product interpretation of self-attention, we show that a CPD kernel can be transformed into a PD kernel by adding a constant offset. This offset is implicitly absorbed in the Softmax normalization during self-attention. The diversity of CPD kernels allows us to derive various RPEs that enable length extrapolation in a principled way. Experiments demonstrate that the logarithmic variant achieves excellent extrapolation performance on three LLMing datasets. Our implementation and pretrained checkpoints are released at https://github.com/chijames/KERPLE.git.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Ta-Chung Chi (19 papers)
  2. Ting-Han Fan (15 papers)
  3. Peter J. Ramadge (25 papers)
  4. Alexander I. Rudnicky (9 papers)
Citations (51)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub