Instruction Set Architecture for Processing-in-Memory DNN Accelerators
The presented paper introduces a novel instruction set architecture (ISA) designed explicitly for processing-in-memory (PIM) deep neural network (DNN) accelerators. This ISA aims to facilitate DNN inference on PIM architectures by leveraging the strengths of several device technologies, including resistive random-access memory (RRAM), flash, ferroelectric field-effect transistor (FeFET), and static random-access memory (SRAM). The focus is on trained and stationary weights throughout the inference phase. The target neural networks include convolutional neural networks (CNNs) and multi-layer perceptrons (MLPs).
Architectural Framework
The architecture abstracted in the paper comprises multiple interconnected cores sharing a global memory. The specific details of the core, memory, and interconnection implementation are left flexible to accommodate various hardware configurations. Each core operates independently, executing unique instructions via three main execution units: a scalar unit for register operations, a PIM matrix unit for matrix-vector multiplications, and a vector unit for executing other neural network operations such as ReLU and pooling. Key components like the PIM matrix unit might work within the analog domain, while maintaining a digital domain interface. This setup facilitates a versatile design adaptable to diverse hardware instantiations.
DNN Mapping
The mapping strategy for DNN inference is systematically described, focusing on optimizing layer assignments to cores and exploiting matrix-vector multiplications for computation efficiency. The implementation is agnostic to the configuration of arrays into logical entities composed of sub-arrays to support necessary operations. The logical orientation ensures a uniform approach to operations without considering the lower-level physical array details, thus maintaining an abstraction layer that aids both hardware efficiency and software usability.
ISA Design Principles
The ISA is meticulously crafted with considerations of DNN operational needs and hardware functionalities. Key principles of this ISA include:
- High-level abstraction of DNN operators, focusing on primary matrix and vector operations.
- Support for configurable bit-widths allowing for precision adjustments across layers.
- Utilization of direct flow operations with firmly demarcated data and control paths.
- Ensuring the ISA is standalone, lacking dependencies on traditional CPU ISAs, and tailored specifically for DNN inference rather than training.
Instruction Definition and Summary
The paper provides a detailed enumeration of instructions with explicit operational semantics for DNN accelerators. The ISA specifies scalar, matrix, vector, and communication instructions, defined with fixed lengths of 32-bit or 64-bit operations. Scalar instructions involve integer operations and data loading from memory addresses; matrix/vector operations facilitate essential DNN computations like matrix-vector multiplications, element-wise operations, and activation functions such as ReLU and sigmoid. The paper emphasizes immediate values to simplify the logic and precludes branching by advocating for unrolled loops.
Communication and synchronization are also addressed within the instruction set, providing mechanisms for efficient data exchange between cores and memory moves within local memories, thus enhancing concurrency and coherence in PIM architectures.
Implications and Future Directions
The implementation of this ISA in the open-source DNN compiler PIMCOMP-NN and the simulator PIMSIM-NN signifies a step towards making PIM-based DNN acceleration feasible and accessible. The transparent nature of the ISA with respect to both applications and hardware allows for seamless adaptation and integration within heterogeneous computational environments. The abstraction level aids not only in unifying toolchains for DNN accelerator design but also promises facilitation in optimizing hardware-software synergies.
The broader implications of this work suggest potential enhancements in PIM technologies by advancing efficient mappings of DNN operations at reduced power consumption and latency levels. Future research could explore expanding this architecture for DNN training, leveraging this ISA's conceptual frameworks, and further investigating compatibility across varied PIM device technologies. The emerging interest in adaptive bit-width adjustments further fuels potential advancements in accuracy and energy efficiency in DNN applications.