Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture (2310.12109v1)

Published 18 Oct 2023 in cs.LG

Abstract: Machine learning models are increasingly being scaled in both sequence length and model dimension to reach longer contexts and better performance. However, existing architectures such as Transformers scale quadratically along both these axes. We ask: are there performant architectures that can scale sub-quadratically along sequence length and model dimension? We introduce Monarch Mixer (M2), a new architecture that uses the same sub-quadratic primitive along both sequence length and model dimension: Monarch matrices, a simple class of expressive structured matrices that captures many linear transforms, achieves high hardware efficiency on GPUs, and scales sub-quadratically. As a proof of concept, we explore the performance of M2 in three domains: non-causal BERT-style LLMing, ViT-style image classification, and causal GPT-style LLMing. For non-causal BERT-style modeling, M2 matches BERT-base and BERT-large in downstream GLUE quality with up to 27% fewer parameters, and achieves up to 9.1$\times$ higher throughput at sequence length 4K. On ImageNet, M2 outperforms ViT-b by 1% in accuracy, with only half the parameters. Causal GPT-style models introduce a technical challenge: enforcing causality via masking introduces a quadratic bottleneck. To alleviate this bottleneck, we develop a novel theoretical view of Monarch matrices based on multivariate polynomial evaluation and interpolation, which lets us parameterize M2 to be causal while remaining sub-quadratic. Using this parameterization, M2 matches GPT-style Transformers at 360M parameters in pretraining perplexity on The PILE--showing for the first time that it may be possible to match Transformer quality without attention or MLPs.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Daniel Y. Fu (25 papers)
  2. Simran Arora (64 papers)
  3. Jessica Grogan (3 papers)
  4. Isys Johnson (4 papers)
  5. Sabri Eyuboglu (13 papers)
  6. Armin W. Thomas (14 papers)
  7. Benjamin Spector (11 papers)
  8. Michael Poli (33 papers)
  9. Atri Rudra (55 papers)
  10. Christopher RĂ© (194 papers)
Citations (34)