Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization (2412.17739v3)

Published 23 Dec 2024 in cs.AI and cs.CL

Abstract: Extending the context length of LLMs (LMs) by improving Rotary Position Embedding (RoPE) has become a trend. While existing works mainly address RoPE's limitations within attention mechanism, this paper provides an analysis across nearly all parts of LMs, uncovering their adverse effects on length generalization for RoPE-based attention. Using Discrete Signal Processing theory, we show that RoPE enables periodic attention by implicitly achieving Non-Uniform Discrete Fourier Transform. However, this periodicity is undermined by the spectral damage caused by: 1) linear layers and activation functions outside of attention; 2) insufficiently trained frequency components brought by time-domain truncation. Building on our observations, we propose Fourier Position Embedding (FoPE), which enhances attention's frequency-domain properties to improve both its periodic extension and length generalization. FoPE constructs Fourier Series and zero-outs the destructive frequency components, increasing model robustness against the spectrum damage. Experiments across various model scales and benchmarks show that, within varying context windows, FoPE maintains a more stable performance compared to RoPE and ALiBi. Several analyses and ablations bring further support to our method and theoretical modeling.

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/GptMaestro/status/1872386772396650499

https://twitter.com/arXivGPT/status/1872342491682222466

https://twitter.com/rohanpaul_ai/status/1877271530540867960

Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization (2412.17739v3)

Summary

Related Papers

Tweets