Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Instruction Set Architecture (ISA) for Processing-in-Memory DNN Accelerators (2308.06449v1)

Published 12 Aug 2023 in cs.PL

Abstract: In this article, we introduce an instruction set architecture (ISA) for processing-in-memory (PIM) based deep neural network (DNN) accelerators. The proposed ISA is for DNN inference on PIM-based architectures. It is assumed that the weights have been trained and programmed into PIM-based DNN accelerators before inference, and they are fixed during inference. We do not restrict the devices of PIM-based DNN accelerators. Popular devices used to build PIM-based DNN accelerators include resistive random-access memory (RRAM), flash, ferroelectric field-effect transistor (FeFET), static random-access memory (SRAM), etc. The target DNNs include convolutional neural networks (CNNs) and multi-layer perceptrons (MLPs). The proposed ISA is transparent to both applications and hardware implementations. It enables to develop unified toolchains for PIM-based DNN accelerators and software stacks. For practical hardware that uses a different ISA, the generated instructions by unified toolchains can easily converted to the target ISA. The proposed ISA has been used in the open-source DNN compiler PIMCOMP-NN (https://github.com/sunxt99/PIMCOMP-NN) and the associated open-source simulator PIMSIM-NN (https://github.com/wangxy-2000/pimsim-nn).

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (1)
  1. Xiaoming Chen (140 papers)
Citations (1)

Summary

Instruction Set Architecture for Processing-in-Memory DNN Accelerators

The presented paper introduces a novel instruction set architecture (ISA) designed explicitly for processing-in-memory (PIM) deep neural network (DNN) accelerators. This ISA aims to facilitate DNN inference on PIM architectures by leveraging the strengths of several device technologies, including resistive random-access memory (RRAM), flash, ferroelectric field-effect transistor (FeFET), and static random-access memory (SRAM). The focus is on trained and stationary weights throughout the inference phase. The target neural networks include convolutional neural networks (CNNs) and multi-layer perceptrons (MLPs).

Architectural Framework

The architecture abstracted in the paper comprises multiple interconnected cores sharing a global memory. The specific details of the core, memory, and interconnection implementation are left flexible to accommodate various hardware configurations. Each core operates independently, executing unique instructions via three main execution units: a scalar unit for register operations, a PIM matrix unit for matrix-vector multiplications, and a vector unit for executing other neural network operations such as ReLU and pooling. Key components like the PIM matrix unit might work within the analog domain, while maintaining a digital domain interface. This setup facilitates a versatile design adaptable to diverse hardware instantiations.

DNN Mapping

The mapping strategy for DNN inference is systematically described, focusing on optimizing layer assignments to cores and exploiting matrix-vector multiplications for computation efficiency. The implementation is agnostic to the configuration of arrays into logical entities composed of sub-arrays to support necessary operations. The logical orientation ensures a uniform approach to operations without considering the lower-level physical array details, thus maintaining an abstraction layer that aids both hardware efficiency and software usability.

ISA Design Principles

The ISA is meticulously crafted with considerations of DNN operational needs and hardware functionalities. Key principles of this ISA include:

  • High-level abstraction of DNN operators, focusing on primary matrix and vector operations.
  • Support for configurable bit-widths allowing for precision adjustments across layers.
  • Utilization of direct flow operations with firmly demarcated data and control paths.
  • Ensuring the ISA is standalone, lacking dependencies on traditional CPU ISAs, and tailored specifically for DNN inference rather than training.

Instruction Definition and Summary

The paper provides a detailed enumeration of instructions with explicit operational semantics for DNN accelerators. The ISA specifies scalar, matrix, vector, and communication instructions, defined with fixed lengths of 32-bit or 64-bit operations. Scalar instructions involve integer operations and data loading from memory addresses; matrix/vector operations facilitate essential DNN computations like matrix-vector multiplications, element-wise operations, and activation functions such as ReLU and sigmoid. The paper emphasizes immediate values to simplify the logic and precludes branching by advocating for unrolled loops.

Communication and synchronization are also addressed within the instruction set, providing mechanisms for efficient data exchange between cores and memory moves within local memories, thus enhancing concurrency and coherence in PIM architectures.

Implications and Future Directions

The implementation of this ISA in the open-source DNN compiler PIMCOMP-NN and the simulator PIMSIM-NN signifies a step towards making PIM-based DNN acceleration feasible and accessible. The transparent nature of the ISA with respect to both applications and hardware allows for seamless adaptation and integration within heterogeneous computational environments. The abstraction level aids not only in unifying toolchains for DNN accelerator design but also promises facilitation in optimizing hardware-software synergies.

The broader implications of this work suggest potential enhancements in PIM technologies by advancing efficient mappings of DNN operations at reduced power consumption and latency levels. Future research could explore expanding this architecture for DNN training, leveraging this ISA's conceptual frameworks, and further investigating compatibility across varied PIM device technologies. The emerging interest in adaptive bit-width adjustments further fuels potential advancements in accuracy and energy efficiency in DNN applications.

Github Logo Streamline Icon: https://streamlinehq.com