Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Zero-Space Cost Fault Tolerance for Transformer-based Language Models on ReRAM (2401.11664v1)

Published 22 Jan 2024 in cs.LG, cs.AI, and cs.AR

Abstract: Resistive Random Access Memory (ReRAM) has emerged as a promising platform for deep neural networks (DNNs) due to its support for parallel in-situ matrix-vector multiplication. However, hardware failures, such as stuck-at-fault defects, can result in significant prediction errors during model inference. While additional crossbars can be used to address these failures, they come with storage overhead and are not efficient in terms of space, energy, and cost. In this paper, we propose a fault protection mechanism that incurs zero space cost. Our approach includes: 1) differentiable structure pruning of rows and columns to reduce model redundancy, 2) weight duplication and voting for robust output, and 3) embedding duplicated most significant bits (MSBs) into the model weight. We evaluate our method on nine tasks of the GLUE benchmark with the BERT model, and experimental results prove its effectiveness.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory. ACM SIGARCH Computer Architecture News, 44(3):27–39, 2016.
  2. Isaac: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In Proceedings of the 43rd International Symposium on Computer Architecture, pages 14–26. IEEE Press, 2016.
  3. Rram defect modeling and failure analysis based on march test and a novel squeeze-search scheme. IEEE Transactions on Computers, 64(1):180–190, 2014.
  4. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  5. Language models are unsupervised multitask learners.
  6. Weighted-entropy-based quantization for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5456–5464, 2017.
  7. Vortex: Variation-aware training for memristor x-bar. In Proceedings of the 52nd Annual Design Automation Conference, pages 1–6, 2015.
  8. Computing-in-memory neural network accelerators for safety-critical systems: Can small device variations be disastrous? In 2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD), pages 1–9. IEEE, 2022.
  9. Swim: Selectivewrite-verify for computing-in-memory neural accelerators. In Proceedings of the IEEE/ACM Design Automation Conference, 2022.
  10. Improving realistic worst-case performance of nvcim dnn accelerators through training with right-censored gaussian noise. In 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD), pages 1–9. IEEE, 2023.
  11. Fault-tolerant deep neural networks for processing-in-memory based autonomous edge systems. In 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 424–429. IEEE, 2022.
  12. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. International Conference on Learning Representations (ICLR), 2016.
  13. Nest: A neural network synthesis tool based on a grow-and-prune paradigm. IEEE Transactions on Computers, 2019.
  14. Fixed point quantization of deep convolutional networks. In International Conference on Machine Learning, pages 2849–2858, 2016.
  15. Binarized neural networks. Advances in neural information processing systems, 29, 2016.
  16. Efficient transformer-based large scale language representations using hardware-friendly block structured pruning. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3187–3199, 2020.
  17. Ftrans: energy-efficient acceleration of transformers using fpga. In Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design, pages 175–180, 2020.
  18. A systematic dnn weight pruning framework using alternating direction method of multipliers. In Proceedings of the European Conference on Computer Vision (ECCV), pages 184–199, 2018.
  19. Autoprune: Automatic network pruning by regularizing auxiliary parameters. Advances in neural information processing systems, 32, 2019.
  20. Forms: Fine-grained polarized reram-based in-situ computation for mixed-signal dnn accelerator. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), pages 265–278, 2021.
  21. Rram defect modeling and failure analysis based on march test and a novel squeeze-search scheme. IEEE Transactions on Computers, 64(1):180–190, 2015.
  22. Handling stuck-at-faults in memristor crossbar arrays using matrix transformations. In Proceedings of the 24th Asia and South Pacific Design Automation Conference, ASPDAC ’19, page 438–443, New York, NY, USA, 2019. Association for Computing Machinery.
  23. Att: A fault-tolerant reram accelerator for attention-based neural networks. In 2020 IEEE 38th International Conference on Computer Design (ICCD), pages 213–221, 2020.
  24. A unified dnn weight pruning framework using reweighted optimization methods. In 2021 58th ACM/IEEE Design Automation Conference (DAC), pages 493–498. IEEE, 2021.
  25. Training sparse neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 138–145, 2017.
  26. Darts: Differentiable architecture search. In International Conference on Learning Representations, 2018.
  27. Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5797–5808, 2019.
  28. Topics in matrix analysis, 1994.
  29. Beckmann Karsten et al. Nanoscale hafnium oxide rram devices exhibit pulse dependent behavior and multi-level resistance capability. Mrs Advances, 1(49):3355–3360, 2016.
  30. Wolf Thomas et al. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pages 38–45, 2020.
Citations (4)

Summary

We haven't generated a summary for this paper yet.