Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Workload-Balanced Pruning for Sparse Spiking Neural Networks (2302.06746v2)

Published 13 Feb 2023 in cs.NE

Abstract: Pruning for Spiking Neural Networks (SNNs) has emerged as a fundamental methodology for deploying deep SNNs on resource-constrained edge devices. Though the existing pruning methods can provide extremely high weight sparsity for deep SNNs, the high weight sparsity brings a workload imbalance problem. Specifically, the workload imbalance happens when a different number of non-zero weights are assigned to hardware units running in parallel. This results in low hardware utilization and thus imposes longer latency and higher energy costs. In preliminary experiments, we show that sparse SNNs (~98% weight sparsity) can suffer as low as ~59% utilization. To alleviate the workload imbalance problem, we propose u-Ticket, where we monitor and adjust the weight connections of the SNN during Lottery Ticket Hypothesis (LTH) based pruning, thus guaranteeing the final ticket gets optimal utilization when deployed onto the hardware. Experiments indicate that our u-Ticket can guarantee up to 100% hardware utilization, thus reducing up to 76.9% latency and 63.8% energy cost compared to the non-utilization-aware LTH method.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Ruokai Yin (15 papers)
  2. Youngeun Kim (48 papers)
  3. Yuhang Li (102 papers)
  4. Abhishek Moitra (30 papers)
  5. Nitin Satpute (4 papers)
  6. Anna Hambitzer (4 papers)
  7. Priyadarshini Panda (104 papers)
Citations (14)

Summary

We haven't generated a summary for this paper yet.