Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ReuseSense: With Great Reuse Comes Greater Efficiency; Effectively Employing Computation Reuse on General-Purpose CPUs (2311.10487v1)

Published 17 Nov 2023 in cs.AR

Abstract: Deep Neural Networks (DNNs) are the de facto algorithm for tackling cognitive tasks in real-world applications such as speech recognition and natural language processing. DNN inference comprises numerous dot product operations between inputs and weights that require numerous multiplications and memory accesses, which hinder their performance and energy consumption when evaluated in modern CPUs. In this work, we leverage the high degree of similarity between consecutive inputs in different DNN layers to improve the performance and energy efficiency of DNN inference on CPUs. To this end, we propose ReuseSense, a new hardware scheme that includes ReuseSensor, an engine to efficiently generate the compute and load instructions needed to evaluate a DNN layer accordingly when sensing similar inputs. By intelligently reusing previously computed product values, ReuseSense allows bypassing computations when encountering input values identical to previous ones. Additionally, it efficiently avoids redundant loads by skipping weight loads associated with the bypassed dot product computations. Our experiments show that ReuseSense achieves an 8x speedup in performance and a 74% reduction in total energy consumption across several DNNs on average over the baseline.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Nitesh Narayana GS (2 papers)
  2. Marc Ordoñez (1 paper)
  3. Lokananda Hari (1 paper)
  4. Franyell Silfa (6 papers)
  5. Antonio González (64 papers)

Summary

We haven't generated a summary for this paper yet.