Sketching as a Tool for Understanding and Accelerating Self-attention for Long Sequences (2112.05359v1)

Published 10 Dec 2021 in cs.LG, cs.CL, and stat.ML

Abstract: Transformer-based models are not efficient in processing long sequences due to the quadratic space and time complexity of the self-attention modules. To address this limitation, Linformer and Informer are proposed to reduce the quadratic complexity to linear (modulo logarithmic factors) via low-dimensional projection and row selection respectively. These two models are intrinsically connected, and to understand their connection, we introduce a theoretical framework of matrix sketching. Based on the theoretical analysis, we propose Skeinformer to accelerate self-attention and further improve the accuracy of matrix approximation to self-attention with three carefully designed components: column sampling, adaptive row normalization and pilot sampling reutilization. Experiments on the Long Range Arena (LRA) benchmark demonstrate that our methods outperform alternatives with a consistently smaller time/space footprint.

PDF Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

Authors (6)

Yifan Chen (164 papers)
Qi Zeng (42 papers)
Di Jin (104 papers)
Heng Ji (266 papers)
Yun Yang (122 papers)
Dilek Hakkani-Tur (94 papers)

Citations (4)

View on Semantic Scholar

Sketching as a Tool for Understanding and Accelerating Self-attention for Long Sequences (2112.05359v1)

Related Papers