Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

NOVA: NoC-based Vector Unit for Mapping Attention Layers on a CNN Accelerator (2405.04206v1)

Published 7 May 2024 in cs.AR, cs.AI, and cs.LG

Abstract: Attention mechanisms are becoming increasingly popular, being used in neural network models in multiple domains such as NLP and vision applications, especially at the edge. However, attention layers are difficult to map onto existing neuro accelerators since they have a much higher density of non-linear operations, which lead to inefficient utilization of today's vector units. This work introduces NOVA, a NoC-based Vector Unit that can perform non-linear operations within the NoC of the accelerators, and can be overlaid onto existing neuro accelerators to map attention layers at the edge. Our results show that the NOVA architecture is up to 37.8x more power-efficient than state-of-the-art hardware approximators when running existing attention-based neural networks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Mohit Upadhyay (2 papers)
  2. Rohan Juneja (5 papers)
  3. Weng-Fai Wong (25 papers)
  4. Li-Shiuan Peh (4 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com