Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sparse Persistent RNNs: Squeezing Large Recurrent Networks On-Chip (1804.10223v1)

Published 26 Apr 2018 in cs.NE, cs.DC, and cs.LG

Abstract: Recurrent Neural Networks (RNNs) are powerful tools for solving sequence-based problems, but their efficacy and execution time are dependent on the size of the network. Following recent work in simplifying these networks with model pruning and a novel mapping of work onto GPUs, we design an efficient implementation for sparse RNNs. We investigate several optimizations and tradeoffs: Lamport timestamps, wide memory loads, and a bank-aware weight layout. With these optimizations, we achieve speedups of over 6x over the next best algorithm for a hidden layer of size 2304, batch size of 4, and a density of 30%. Further, our technique allows for models of over 5x the size to fit on a GPU for a speedup of 2x, enabling larger networks to help advance the state-of-the-art. We perform case studies on NMT and speech recognition tasks in the appendix, accelerating their recurrent layers by up to 3x.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Feiwen Zhu (5 papers)
  2. Jeff Pool (11 papers)
  3. Michael Andersch (5 papers)
  4. Jeremy Appleyard (4 papers)
  5. Fung Xie (1 paper)
Citations (28)