Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Algorithm and Hardness for Dynamic Attention Maintenance in Large Language Models (2304.02207v1)

Published 5 Apr 2023 in cs.DS and cs.CC

Abstract: LLMs have made fundamental changes in human life. The attention scheme is one of the key components over all the LLMs, such as BERT, GPT-1, Transformers, GPT-2, 3, 3.5 and 4. Inspired by previous theoretical study of static version of the attention multiplication problem [Zandieh, Han, Daliri, and Karbasi arXiv 2023, Alman and Song arXiv 2023]. In this work, we formally define a dynamic version of attention matrix multiplication problem. There are matrices $Q,K, V \in \mathbb{R}{n \times d}$, they represent query, key and value in LLMs. In each iteration we update one entry in $K$ or $V$. In the query stage, we receive $(i,j) \in [n] \times [d]$ as input, and want to answer $(D{-1} A V)_{i,j}$, where $A:=\exp(QK\top) \in \mathbb{R}{n \times n}$ is a square matrix and $D := \mathrm{diag}(A {\bf 1}_n) \in \mathbb{R}{n \times n}$ is a diagonal matrix. Here ${\bf 1}_n$ denote a length-$n$ vector that all the entries are ones. We provide two results: an algorithm and a conditional lower bound. $\bullet$ On one hand, inspired by the lazy update idea from [Demetrescu and Italiano FOCS 2000, Sankowski FOCS 2004, Cohen, Lee and Song STOC 2019, Brand SODA 2020], we provide a data-structure that uses $O(n{\omega(1,1,\tau)-\tau})$ amortized update time, and $O(n{1+\tau})$ worst-case query time. $\bullet$ On the other hand, show that unless the hinted matrix vector multiplication conjecture [Brand, Nanongkai and Saranurak FOCS 2019] is false, there is no algorithm that can use both $O(n{\omega(1,1,\tau) - \tau- \Omega(1)})$ amortized update time, and $O(n{1+\tau-\Omega(1)})$ worst query time. In conclusion, our algorithmic result is conditionally optimal unless hinted matrix vector multiplication conjecture is false.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Jan van den Brand (27 papers)
  2. Zhao Song (253 papers)
  3. Tianyi Zhou (172 papers)
Citations (31)