Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sketching as a Tool for Understanding and Accelerating Self-attention for Long Sequences (2112.05359v1)

Published 10 Dec 2021 in cs.LG, cs.CL, and stat.ML

Abstract: Transformer-based models are not efficient in processing long sequences due to the quadratic space and time complexity of the self-attention modules. To address this limitation, Linformer and Informer are proposed to reduce the quadratic complexity to linear (modulo logarithmic factors) via low-dimensional projection and row selection respectively. These two models are intrinsically connected, and to understand their connection, we introduce a theoretical framework of matrix sketching. Based on the theoretical analysis, we propose Skeinformer to accelerate self-attention and further improve the accuracy of matrix approximation to self-attention with three carefully designed components: column sampling, adaptive row normalization and pilot sampling reutilization. Experiments on the Long Range Arena (LRA) benchmark demonstrate that our methods outperform alternatives with a consistently smaller time/space footprint.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yifan Chen (164 papers)
  2. Qi Zeng (42 papers)
  3. Di Jin (104 papers)
  4. Heng Ji (266 papers)
  5. Yun Yang (122 papers)
  6. Dilek Hakkani-Tur (94 papers)
Citations (4)