Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PairConnect: A Compute-Efficient MLP Alternative to Attention (2106.08235v1)

Published 15 Jun 2021 in cs.LG and cs.CL

Abstract: Transformer models have demonstrated superior performance in natural language processing. The dot product self-attention in Transformer allows us to model interactions between words. However, this modeling comes with significant computational overhead. In this work, we revisit the memory-compute trade-off associated with Transformer, particularly multi-head attention, and show a memory-heavy but significantly more compute-efficient alternative to Transformer. Our proposal, denoted as PairConnect, a multilayer perceptron (MLP), models the pairwise interaction between words by explicit pairwise word embeddings. As a result, PairConnect substitutes self dot product with a simple embedding lookup. We show mathematically that despite being an MLP, our compute-efficient PairConnect is strictly more expressive than Transformer. Our experiment on LLMing tasks suggests that PairConnect could achieve comparable results with Transformer while reducing the computational cost associated with inference significantly.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Zhaozhuo Xu (43 papers)
  2. Minghao Yan (8 papers)
  3. Junyan Zhang (29 papers)
  4. Anshumali Shrivastava (102 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com