Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Domino: A Tailored Network-on-Chip Architecture to Enable Highly Localized Inter- and Intra-Memory DNN Computing (2107.09500v1)

Published 18 Jul 2021 in cs.AR

Abstract: The ever-increasing computation complexity of fast-growing Deep Neural Networks (DNNs) has requested new computing paradigms to overcome the memory wall in conventional Von Neumann computing architectures. The emerging Computing-In-Memory (CIM) architecture has been a promising candidate to accelerate neural network computing. However, the data movement between CIM arrays may still dominate the total power consumption in conventional designs. This paper proposes a flexible CIM processor architecture named Domino to enable stream computing and local data access to significantly reduce the data movement energy. Meanwhile, Domino employs tailored distributed instruction scheduling within Network-on-Chip (NoC) to implement inter-memory-computing and attain mapping flexibility. The evaluation with prevailing CNN models shows that Domino achieves 1.15-to-9.49$\times$ power efficiency over several state-of-the-art CIM accelerators and improves the throughput by 1.57-to-12.96$\times$.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Kaining Zhou (2 papers)
  2. Yangshuo He (4 papers)
  3. Rui Xiao (18 papers)
  4. Kejie Huang (24 papers)
Citations (1)