CiMNet: Towards Joint Optimization for DNN Architecture and Configuration for Compute-In-Memory Hardware (2402.11780v2)
Abstract: With the recent growth in demand for large-scale deep neural networks, compute in-memory (CiM) has come up as a prominent solution to alleviate bandwidth and on-chip interconnect bottlenecks that constrain Von-Neuman architectures. However, the construction of CiM hardware poses a challenge as any specific memory hierarchy in terms of cache sizes and memory bandwidth at different interfaces may not be ideally matched to any neural network's attributes such as tensor dimension and arithmetic intensity, thus leading to suboptimal and under-performing systems. Despite the success of neural architecture search (NAS) techniques in yielding efficient sub-networks for a given hardware metric budget (e.g., DNN execution time or latency), it assumes the hardware configuration to be frozen, often yielding sub-optimal sub-networks for a given budget. In this paper, we present CiMNet, a framework that jointly searches for optimal sub-networks and hardware configurations for CiM architectures creating a Pareto optimal frontier of downstream task accuracy and execution metrics (e.g., latency). The proposed framework can comprehend the complex interplay between a sub-network's performance and the CiM hardware configuration choices including bandwidth, processing element size, and memory size. Exhaustive experiments on different model architectures from both CNN and Transformer families demonstrate the efficacy of the CiMNet in finding co-optimized sub-networks and CiM hardware configurations. Specifically, for similar ImageNet classification accuracy as baseline ViT-B, optimizing only the model architecture increases performance (or reduces workload execution time) by 1.7x while optimizing for both the model architecture and hardware configuration increases it by 3.1x.
- Efficient Design Space Exploration for Application Specific Systems-on-a-Chip. J. Syst. Archit. 53, 10 (oct 2007), 733–750. https://doi.org/10.1016/j.sysarc.2007.01.004
- A Multiobjective Genetic Approach for System-Level Exploration in Parameterized Systems-on-a-Chip. Trans. Comp.-Aided Des. Integ. Cir. Sys. 24, 4 (nov 2006), 635–645. https://doi.org/10.1109/TCAD.2005.844118
- Language Models are Few-Shot Learners. arXiv:2005.14165 [cs.CL]
- Once-for-all: Train one network and specialize it for efficient deployment. arXiv preprint arXiv:1908.09791 (2019).
- Path-level network transformation for efficient architecture search. In International Conference on Machine Learning. PMLR, 678–687.
- Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. IEEE Journal of Solid-State Circuits 52, 1 (2017), 127–138. https://doi.org/10.1109/JSSC.2016.2616357
- PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 27–39. https://doi.org/10.1109/ISCA.2016.13
- A Hardware-Aware Framework for Accelerating Neural Architecture Search Across Modalities. arXiv:2205.10358 [cs.LG]
- Accelerating Neural Architecture Exploration Across Modalities Using Genetic Algorithms. arXiv:2202.12934 [cs.NE]
- Microarchitectural design space exploration using an architecture-centric approach. In 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007). IEEE, 262–271.
- Efficient design space exploration of high performance embedded out-of-order processors. In Proceedings of the Design Automation & Test in Europe Conference, Vol. 1. IEEE, 1–6.
- CSrram: Area-efficient low-power ex-situ training framework for memristive neuromorphic circuits based on clustered sparsity. In 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI). IEEE, 465–470.
- Introduction to Intel Core Duo Processor Architecture. Intel Technology Journal 10 (05 2006). https://doi.org/10.1535/itj.1002.01
- FPGA/DNN co-design: An efficient design methodology for IoT intelligence on the edge. In Proceedings of the 56th Annual Design Automation Conference 2019. 1–6.
- Accuracy vs. efficiency: Achieving both through fpga-implementation aware neural architecture search. In Proceedings of the 56th Annual Design Automation Conference 2019. 1–6.
- Mike Johnson and Mike Johnson. 1991. Superscalar Microprocessor Design.
- Accurate deep neural network inference using computational phase-change memory. Nature Communications 11, 1 (18 May 2020), 2473. https://doi.org/10.1038/s41467-020-16108-9
- In-datacenter performance analysis of a tensor processing unit. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). 1–12. https://doi.org/10.1145/3079856.3080246
- Toward Adversary-aware Non-iterative Model Pruning through D ynamic N etwork R ewiring of DNNs. ACM Transactions on Embedded Computing Systems 21, 5 (2022), 1–24.
- Dnr: A tunable robust pruning framework through dynamic network rewiring of dnns. In Proceedings of the 26th Asia and South Pacific Design Automation Conference. 344–350.
- Making Models Shallow Again: Jointly Learning to Reduce Non-Linearity and Depth for Latency-Efficient Private Inference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4684–4688.
- A 64-core mixed-signal in-memory compute chip based on phase-change memory for deep neural network inference. Nature Electronics 6, 9 (01 Sep 2023), 680–693. https://doi.org/10.1038/s41928-023-01010-1
- Mcunet: Tiny deep learning on iot devices. Advances in Neural Information Processing Systems 33 (2020), 11711–11722.
- Bayesian Optimization for Efficient Accelerator Synthesis. ACM Trans. Archit. Code Optim. 18, 1, Article 4 (dec 2021), 25 pages. https://doi.org/10.1145/3427377
- Look-Up Table based Energy Efficient Processing in Cache Support for Neural Network Acceleration. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 88–101. https://doi.org/10.1109/MICRO50266.2020.00020
- Look-up table based energy efficient processing in cache support for neural network acceleration. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 88–101.
- ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 14–26. https://doi.org/10.1109/ISCA.2016.12
- Near-memory computing: Past, present, and future. Microprocessors and Microsystems 71 (2019), 102868.
- InstaTune: Instantaneous Neural Architecture Search During Fine-Tuning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1523–1527.
- Single-path nas: Designing hardware-efficient convnets in less than 4 hours. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 481–497.
- LLaMA: Open and Efficient Foundation Language Models. arXiv:2302.13971 [cs.CL]
- John von Neumann. 1945. Von neumann architecture. Online http://en. wikipedia. org/wiki/Von_Neumann_architecture 8 (1945).
- Hat: Hardware-aware transformers for efficient natural language processing. arXiv preprint arXiv:2005.14187 (2020).
- Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10734–10742.
- Overview of NSGA-II for optimizing machining process parameters. Procedia Engineering 15 (2011), 3978–3983.
- SAL-ViT: Towards Latency Efficient Private Inference on ViT using Selective Attention Search with a Learnable Softmax Approximation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5116–5125.
- Barret Zoph and Quoc V Le. 2016. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578 (2016).
- Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8697–8710.
- Souvik Kundu (76 papers)
- Anthony Sarah (10 papers)
- Vinay Joshi (8 papers)
- Om J Omer (3 papers)
- Sreenivas Subramoney (21 papers)