Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Activity Sparsity Complements Weight Sparsity for Efficient RNN Inference (2311.07625v3)

Published 13 Nov 2023 in cs.LG

Abstract: Artificial neural networks open up unprecedented machine learning capabilities at the cost of ever growing computational requirements. Sparsifying the parameters, often achieved through weight pruning, has been identified as a powerful technique to compress the number of model parameters and reduce the computational operations of neural networks. Yet, sparse activations, while omnipresent in both biological neural networks and deep learning systems, have not been fully utilized as a compression technique in deep learning. Moreover, the interaction between sparse activations and weight pruning is not fully understood. In this work, we demonstrate that activity sparsity can compose multiplicatively with parameter sparsity in a recurrent neural network model based on the GRU that is designed to be activity sparse. We achieve up to $20\times$ reduction of computation while maintaining perplexities below $60$ on the Penn Treebank LLMing task. This magnitude of reduction has not been achieved previously with solely sparsely connected LSTMs, and the LLMing performance of our model has not been achieved previously with any sparsely activated recurrent neural networks or spiking neural networks. Neuromorphic computing devices are especially good at taking advantage of the dynamic activity sparsity, and our results provide strong evidence that making deep learning models activity sparse and porting them to neuromorphic devices can be a viable strategy that does not compromise on task performance. Our results also drive further convergence of methods from deep learning and neuromorphic computing for efficient machine learning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. Deep rewiring: Training very sparse deep networks. In International Conference on Learning Representations, 2018a. URL https://openreview.net/forum?id=BJ_wN01C-.
  2. Long short-term memory and Learning-to-learn in networks of spiking neurons. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 787–797. Curran Associates, Inc., 2018b.
  3. Fast and efficient deep sparse multi-strength spiking neural networks with dynamic pruning. In 2018 International Joint Conference on Neural Networks (IJCNN), pages 1–8, 2018. doi: 10.1109/IJCNN.2018.8489339.
  4. Pruning of deep spiking neural networks through gradient rewiring. In Zhi-Hua Zhou, editor, Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pages 1713–1721. International Joint Conferences on Artificial Intelligence Organization, 8 2021. doi: 10.24963/ijcai.2021/236. URL https://doi.org/10.24963/ijcai.2021/236. Main Track.
  5. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Alessandro Moschitti, Bo Pang, and Walter Daelemans, editors, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pages 1724–1734. ACL, 2014. doi: 10.3115/v1/d14-1179. URL https://doi.org/10.3115/v1/d14-1179.
  6. 2022 roadmap on neuromorphic computing and engineering. Neuromorphic Computing and Engineering, 2(2):022501, may 2022. doi: 10.1088/2634-4386/ac4a83. URL https://dx.doi.org/10.1088/2634-4386/ac4a83.
  7. Grow and prune compact, fast, and accurate lstms. IEEE Transactions on Computers, 69(3):441–452, 2020. doi: 10.1109/TC.2019.2954495.
  8. Hungry Hungry Hippos: Towards Language Modeling with State Space Models, December 2022. URL http://arxiv.org/abs/2212.14052.
  9. Training spiking neural networks using lessons from deep learning. Proceedings of the IEEE, 111(9):1016–1054, 2023. doi: 10.1109/JPROC.2023.3308088.
  10. The lottery ticket hypothesis: Finding sparse, trainable neural networks. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=rJl-b3RcF7.
  11. Hungry hungry hippos: Towards language modeling with state space models. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=COZDy0WYGg.
  12. Spartus: A 9.4 top/s fpga-based lstm accelerator exploiting spatio-temporal sparsity. IEEE Transactions on Neural Networks and Learning Systems, pages 1–15, 2022. doi: 10.1109/TNNLS.2022.3180209.
  13. Spiking neural networks. Int. J. Neural Syst., 19:295–308, 08 2009. doi: 10.1142/S0129065709002002.
  14. Improving neural language models with a continuous cache. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=B184E5qee.
  15. Efficiently modeling long sequences with structured state spaces. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=uYLFoz1vlAC.
  16. Learning both weights and connections for efficient neural network. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015. URL https://proceedings.neurips.cc/paper_files/paper/2015/file/ae0eb3eed39d2bcef4622b2499a05fe6-Paper.pdf.
  17. Ese: Efficient speech recognition engine with sparse lstm on fpga. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA ’17, page 75–84, New York, NY, USA, 2017. Association for Computing Machinery. ISBN 9781450343541. doi: 10.1145/3020078.3021745. URL https://doi.org/10.1145/3020078.3021745.
  18. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
  19. Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks. J. Mach. Learn. Res., 22(1), jan 2021. ISSN 1532-4435. URL http://jmlr.org/papers/v22/21-0366.html.
  20. Mark Horowitz. 1.1 computing’s energy problem (and what we can do about it). In 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pages 10–14, 2014. doi: 10.1109/ISSCC.2014.6757323.
  21. Two sparsities are better than one: unlocking the performance benefits of sparse–sparse networks. Neuromorphic Computing and Engineering, 2(3):034004, jul 2022. doi: 10.1088/2634-4386/ac7c8a. URL https://dx.doi.org/10.1088/2634-4386/ac7c8a.
  22. Exploring lottery ticket hypothesis in spiking neural networks. In Shai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner, editors, Computer Vision – ECCV 2022, pages 102–120, Cham, 2022. Springer Nature Switzerland. ISBN 978-3-031-19775-8.
  23. Inducing and exploiting activation sparsity for fast inference on deep neural networks. In Hal Daumé III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 5533–5543. PMLR, 13–18 Jul 2020. URL https://proceedings.mlr.press/v119/kurtz20a.html.
  24. Independently recurrent neural network (indrnn): Building a longer and deeper rnn. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5457–5466, 2018. doi: 10.1109/CVPR.2018.00572.
  25. The lazy neuron phenomenon: On emergence of activation sparsity in transformers. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=TJ2nxciYCk-.
  26. Loihi asynchronous neuromorphic research chip. In 2018 24th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC), pages 32–33, 2018. doi: 10.1109/ASYNC.2018.00018.
  27. Building a large annotated corpus of english: The penn treebank. Comput. Linguist., 19(2):313–330, jun 1993. ISSN 0891-2017.
  28. Spinnaker 2: A 10 million core processor system for brain simulation and machine learning, 2019.
  29. On the state of the art of evaluation in neural language models. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=ByJHuTgA-.
  30. Mogrifier lstm. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=SJe5P6EYvS.
  31. Gaurav Menghani. Efficient deep learning: A survey on making deep learning models smaller, faster, and better. ACM Comput. Surv., 55(12), mar 2023. ISSN 0360-0300. doi: 10.1145/3578938. URL https://doi.org/10.1145/3578938.
  32. Pointer sentinel mixture models. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=Byj72udxe.
  33. Regularizing and optimizing LSTM language models. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=SyyGPP0TZ.
  34. Exploring sparsity in recurrent neural networks. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=BylSPv9gx.
  35. Delta networks for optimized recurrent network computation. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 2584–2593. PMLR, 06–11 Aug 2017. URL https://proceedings.mlr.press/v70/neil17a.html.
  36. Connection pruning for deep spiking neural networks with on-chip learning. In International Conference on Neuromorphic Systems 2021, ICONS 2021, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450386913. doi: 10.1145/3477145.3477157. URL https://doi.org/10.1145/3477145.3477157.
  37. Resurrecting Recurrent Neural Networks for Long Sequences, March 2023. URL http://arxiv.org/abs/2303.06349.
  38. Scnn: An accelerator for compressed-sparse convolutional neural networks. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pages 27–40, 2017. doi: 10.1145/3079856.3080254.
  39. Rwkv: Reinventing rnns for the transformer era, 2023.
  40. Stdp-based pruning of connections and weight quantization in spiking neural networks for energy-efficient recognition. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 38(4):668–677, 2019. doi: 10.1109/TCAD.2018.2819366.
  41. Efficient recurrent architectures through activity sparsity and sparse back-propagation through time. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=lJdOlWg8td.
  42. A review of learning in biologically plausible spiking neural networks. Neural Networks, 122:253–272, 2020. ISSN 0893-6080. doi: https://doi.org/10.1016/j.neunet.2019.09.036. URL https://www.sciencedirect.com/science/article/pii/S0893608019303181.
  43. Sachin S. Talathi and Aniket Vartak. Improving performance of recurrent neural network with relu nonlinearity. In International Conference on Learning Representations: Workshop Track, 2016.
  44. Deep learning in spiking neural networks. Neural Networks, 111:47–63, 2019. ISSN 0893-6080. doi: https://doi.org/10.1016/j.neunet.2018.12.002. URL https://www.sciencedirect.com/science/article/pii/S0893608018303332.
  45. Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
  46. Regularization of neural networks using dropconnect. In Sanjoy Dasgupta and David McAllester, editors, Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pages 1058–1066, Atlanta, Georgia, USA, 17–19 Jun 2013. PMLR. URL https://proceedings.mlr.press/v28/wan13.html.
  47. Deep learning incorporating biologically inspired neural dynamics and in-memory computing. Nature Machine Intelligence, 2(6):325–336, Jun 2020. ISSN 2522-5839. doi: 10.1038/s42256-020-0187-0. URL https://doi.org/10.1038/s42256-020-0187-0.
  48. Integer quantization for deep learning inference: Principles and empirical evaluation. ArXiv, abs/2004.09602, 2020.
  49. Spiking neural networks and their applications: A review. Brain Sciences, 12(7), 2022. ISSN 2076-3425. doi: 10.3390/brainsci12070863. URL https://www.mdpi.com/2076-3425/12/7/863.
  50. Breaking the softmax bottleneck: A high-rank RNN language model. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=HkwZSG-CZ.
  51. Workload-balanced pruning for sparse spiking neural networks, 2023.
  52. Sticker: An energy-efficient multi-sparsity compatible accelerator for convolutional neural networks in 65-nm cmos. IEEE Journal of Solid-State Circuits, 55(2):465–477, 2020. doi: 10.1109/JSSC.2019.2946771.
  53. Snap: An efficient sparse neural acceleration processor for unstructured sparse deep neural network inference. IEEE Journal of Solid-State Circuits, 56(2):636–647, 2021. doi: 10.1109/JSSC.2020.3043870.
  54. To prune, or not to prune: Exploring the efficacy of pruning for model compression. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Workshop Track Proceedings. OpenReview.net, 2018. URL https://openreview.net/forum?id=Sy1iIDkPM.
  55. Spikegpt: Generative pre-trained language model with spiking neural networks, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Rishav Mukherji (5 papers)
  2. Mark Schöne (8 papers)
  3. Khaleelulla Khan Nazeer (8 papers)
  4. Christian Mayr (35 papers)
  5. Anand Subramoney (17 papers)
Citations (2)
X Twitter Logo Streamline Icon: https://streamlinehq.com