Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions (2310.18780v1)

Published 28 Oct 2023 in cs.LG, cs.AI, and eess.SP

Abstract: Recent advances in attention-free sequence models rely on convolutions as alternatives to the attention operator at the core of Transformers. In particular, long convolution sequence models have achieved state-of-the-art performance in many domains, but incur a significant cost during auto-regressive inference workloads -- naively requiring a full pass (or caching of activations) over the input sequence for each generated token -- similarly to attention-based models. In this paper, we seek to enable $\mathcal O(1)$ compute and memory cost per token in any pre-trained long convolution architecture to reduce memory footprint and increase throughput during generation. Concretely, our methods consist in extracting low-dimensional linear state-space models from each convolution layer, building upon rational interpolation and model-order reduction techniques. We further introduce architectural improvements to convolution-based layers such as Hyena: by weight-tying the filters across channels into heads, we achieve higher pre-training quality and reduce the number of filters to be distilled. The resulting model achieves 10x higher throughput than Transformers and 1.5x higher than Hyena at 1.3B parameters, without any loss in quality after distillation.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (14)
  1. Stefano Massaroli (28 papers)
  2. Michael Poli (33 papers)
  3. Daniel Y. Fu (25 papers)
  4. Hermann Kumbong (5 papers)
  5. Rom N. Parnichkun (3 papers)
  6. Aman Timalsina (6 papers)
  7. David W. Romero (22 papers)
  8. Quinn McIntyre (2 papers)
  9. Beidi Chen (61 papers)
  10. Atri Rudra (55 papers)
  11. Ce Zhang (215 papers)
  12. Stefano Ermon (279 papers)
  13. Yoshua Bengio (601 papers)
  14. Christopher Re (23 papers)
Citations (17)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com