Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Compressing RNNs for IoT devices by 15-38x using Kronecker Products (1906.02876v5)

Published 7 Jun 2019 in cs.LG, cs.NE, and stat.ML

Abstract: Recurrent Neural Networks (RNN) can be difficult to deploy on resource constrained devices due to their size.As a result, there is a need for compression techniques that can significantly compress RNNs without negatively impacting task accuracy. This paper introduces a method to compress RNNs for resource constrained environments using Kronecker product (KP). KPs can compress RNN layers by 15-38x with minimal accuracy loss. By quantizing the resulting models to 8-bits, we further push the compression factor to 50x. We show that KP can beat the task accuracy achieved by other state-of-the-art compression techniques across 5 benchmarks spanning 3 different applications, while simultaneously improving inference run-time. We show that the KP compression mechanism does introduce an accuracy loss, which can be mitigated by a proposed hybrid KP (HKP) approach. Our HKP algorithm provides fine-grained control over the compression ratio, enabling us to regain accuracy lost during compression by adding a small number of model parameters.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Urmish Thakker (26 papers)
  2. Jesse Beu (10 papers)
  3. Dibakar Gope (17 papers)
  4. Chu Zhou (7 papers)
  5. Igor Fedorov (24 papers)
  6. Ganesh Dasika (7 papers)
  7. Matthew Mattina (35 papers)
Citations (34)

Summary

We haven't generated a summary for this paper yet.