Higher Order Linear Transformer (2010.14816v1)

Published 28 Oct 2020 in cs.LG

Abstract: Following up on the linear transformer part of the article from Katharopoulos et al., that takes this idea from Shen et al., the trick that produces a linear complexity for the attention mechanism is re-used and extended to a second-order approximation of the softmax normalization.

Citations (1)