2000 character limit reached
Higher Order Linear Transformer (2010.14816v1)
Published 28 Oct 2020 in cs.LG
Abstract: Following up on the linear transformer part of the article from Katharopoulos et al., that takes this idea from Shen et al., the trick that produces a linear complexity for the attention mechanism is re-used and extended to a second-order approximation of the softmax normalization.