An Overview of CIMFlow: A Framework for Digital Compute-in-Memory Architectures
CIMFlow represents a comprehensive framework designed to enhance the development and optimization of digital Compute-in-Memory (CIM) architectures for accelerating Deep Neural Networks (DNNs). This paper addresses the critical challenges associated with digital CIM architectures, including the "memory wall" bottleneck prevalent in traditional von Neumann architectures and the capacity constraints inherent in digital implementations.
Digital CIM architectures differ from their analog counterparts by integrating digital logic within SRAM arrays, which eliminates the analog-to-digital and digital-to-analog conversion issues. This configuration offers robust computation and increased parallelism, which are vital for modern DNN acceleration. However, the optimization of these architectures for diverse DNN workloads is nontrivial, mainly due to the vast design space and the lack of holistic tools that integrate both hardware and software aspects of design.
CIMFlow innovatively bridges this gap by offering an integrated workflow that includes a flexible Instruction Set Architecture (ISA), a compilation framework, and a simulation environment. The authors have meticulously crafted a hierarchical ISA that provides a modular and extensible foundation, accommodating various architectural configurations. The compilation framework, based on the MLIR infrastructure, efficiently handles capacity constraints through advanced partitioning and parallelism strategies. The cycle-accurate simulator allows for detailed performance insights, enabling thorough exploration and evaluation of design choices.
The framework strategically partitions DNNs into executable modules, facilitating workload distribution and memory management. This process employs a dynamic programming-based approach to optimize the mapping of partitions onto available resources, effectively countering the limited memory capacity. Compared to existing tools, CIMFlow's modular and integrated structure allows for extensive design space exploration, crucial for navigating the continuously evolving landscape of DNN architectures and workloads.
Experimental evaluations highlight the efficacy of the proposed optimizations, with results showing up to a 2.8-fold speedup and a 61.7% reduction in energy consumption compared to benchmarks using conventional strategies. The comprehensive nature of CIMFlow offers significant implications for digital CIM architecture design, promoting its scalability and adaptability to future DNN models and emerging workloads.
The findings underscore the importance of an integrated hardware-software approach in the development of digital CIM systems, which could significantly influence future designs and optimizations in AI accelerator architectures. As the authors aim to expand CIMFlow to incorporate emerging DNN operators and automated design space exploration, the framework is poised to play a substantial role in advancing digital CIM as a mainstream solution for next-generation AI acceleration. The immediate availability of this framework to researchers can foster innovation and further exploration into efficient DNN processing capabilities.