Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AutoGMap: Learning to Map Large-scale Sparse Graphs on Memristive Crossbars (2111.07684v3)

Published 15 Nov 2021 in cs.LG and cs.ET

Abstract: The sparse representation of graphs has shown great potential for accelerating the computation of graph applications (e.g., Social Networks, Knowledge Graphs) on traditional computing architectures (CPU, GPU, or TPU). But the exploration of large-scale sparse graph computing on processing-in-memory (PIM) platforms (typically with memristive crossbars) is still in its infancy. To implement the computation or storage of large-scale or batch graphs on memristive crossbars, a natural assumption is that a large-scale crossbar is demanded, but with low utilization. Some recent works question this assumption, to avoid the waste of storage and computational resource, the fixed-size or progressively scheduled ''block partition'' schemes are proposed. However, these methods are coarse-grained or static, and are not effectively sparsity-aware. This work proposes the dynamic sparsity-aware mapping scheme generating method that models the problem with a sequential decision-making model, and optimizes it by reinforcement learning (RL) algorithm (REINFORCE). Our generating model (LSTM, combined with the dynamic-fill scheme) generates remarkable mapping performance on a small-scale graph/matrix data (complete mapping costs 43% area of the original matrix) and two large-scale matrix data (costing 22.5% area on qh882 and 17.1% area on qh1484). Our method may be extended to sparse graph computing on other PIM architectures, not limited to the memristive device-based platforms.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. L. Song, Y. Zhuo, X. Qian, H. Li, and Y. Chen, “Graphr: Accelerating graph processing using reram,” in 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).   IEEE, 2018, pp. 531–543.
  2. G. Dai, T. Huang, Y. Wang, H. Yang, and J. Wawrzynek, “Graphsar: a sparsity-aware processing-in-memory architecture for large-scale graph processing on rerams,” in Proceedings of the 24th Asia and South Pacific Design Automation Conference, 2019, pp. 120–126.
  3. Z. Song, X. Yang, Z. Xu, and I. King, “Graph-based semi-supervised learning: A comprehensive review,” IEEE Transactions on Neural Networks and Learning Systems, 2022.
  4. W. Zhao, S. Tan, Z. Guan, B. Zhang, M. Gong, Z. Cao, and Q. Wang, “Learning to map social network users by unified manifold alignment on hypergraph,” IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 12, pp. 5834–5846, 2018.
  5. G. Huang, G. Dai, Y. Wang, and H. Yang, “Ge-spmm: General-purpose sparse matrix-matrix multiplication on gpus for graph neural networks,” in SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.   IEEE, 2020, pp. 1–12.
  6. M. Balog, B. van Merriënboer, S. Moitra, Y. Li, and D. Tarlow, “Fast training of sparse graph neural networks on dense hardware,” arXiv preprint arXiv:1906.11786, 2019.
  7. J. Cui and Q. Qiu, “Towards memristor based accelerator for sparse matrix vector multiplication,” in 2016 IEEE International Symposium on Circuits and Systems (ISCAS).   IEEE, 2016, pp. 121–124.
  8. E. Cuthill and J. McKee, “Reducing the bandwidth of sparse symmetric matrices,” in Proceedings of the 1969 24th National Conference, New York, NY, USA, 1969, p. 157–172.
  9. S. W. Keckler, W. J. Dally, B. Khailany, M. Garland, and D. Glasco, “Gpus and the future of parallel computing,” IEEE micro, vol. 31, no. 5, pp. 7–17, 2011.
  10. S. Ghapani, W. Ren, F. Chen, and Y. Song, “Distributed average tracking for double-integrator multi-agent systems with reduced requirement on velocity measurements,” Automatica, vol. 81, pp. 1–7, 2017.
  11. W. Ao, Y. Song, and C. Wen, “Distributed secure state estimation and control for cpss under sensor attacks,” IEEE transactions on cybernetics, vol. 50, no. 1, pp. 259–269, 2018.
  12. L. Zhang, L. He, and Y. Song, “New results on stability analysis of delayed systems derived from extended wirtinger’s integral inequality,” Neurocomputing, vol. 283, pp. 98–106, 2018.
  13. K. Zhao, Y. Song, C. P. Chen, and L. Chen, “Adaptive asymptotic tracking with global performance for nonlinear systems with unknown control directions,” IEEE Transactions on Automatic Control, vol. 67, no. 3, pp. 1566–1573, 2021.
  14. J. Ahn, S. Hong, S. Yoo, O. Mutlu, and K. Choi, “A scalable processing-in-memory accelerator for parallel graph processing,” in Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015, pp. 105–117.
  15. D. Zhang, N. Jayasena, A. Lyashevsky, J. L. Greathouse, L. Xu, and M. Ignatowski, “Top-pim: Throughput-oriented programmable processing in memory,” in Proceedings of the 23rd international symposium on High-performance parallel and distributed computing, 2014, pp. 85–98.
  16. P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y. Liu, Y. Wang, and Y. Xie, “Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory,” ACM SIGARCH Computer Architecture News, vol. 44, no. 3, pp. 27–39, 2016.
  17. Y. Zhuo, C. Wang, M. Zhang, R. Wang, D. Niu, Y. Wang, and X. Qian, “Graphq: Scalable pim-based graph processing,” in Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019, pp. 712–725.
  18. J. Lin, Z. Zhu, Y. Wang, and Y. Xie, “Learning the sparsity for reram: Mapping and pruning sparse neural network for reram based accelerator,” in Proceedings of the 24th Asia and South Pacific Design Automation Conference, 2019, pp. 639–644.
  19. L. Chua, “Memristor-the missing circuit element,” IEEE Transactions on circuit theory, vol. 18, no. 5, pp. 507–519, 1971.
  20. D. B. Strukov, G. S. Snider, D. R. Stewart, and R. S. Williams, “The missing memristor found,” nature, vol. 453, no. 7191, pp. 80–83, 2008.
  21. X. Hu, G. Feng, S. Duan, and L. Liu, “A memristive multilayer cellular neural network with applications to image processing,” IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 8, pp. 1889–1901, 2016.
  22. O. Krestinskaya, A. P. James, and L. O. Chua, “Neuromemristive circuits for edge computing: A review,” IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 1, pp. 4–23, 2019.
  23. S. Wen, R. Hu, Y. Yang, T. Huang, Z. Zeng, and Y.-D. Song, “Memristor-based echo state network with online least mean square,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 49, no. 9, pp. 1787–1796, 2018.
  24. C. Yakopcic, M. Z. Alom, and T. M. Taha, “Extremely parallel memristor crossbar architecture for convolutional neural network implementation,” in 2017 International Joint Conference on Neural Networks (IJCNN).   IEEE, 2017, pp. 1696–1703.
  25. X. Yang, B. Yan, H. Li, and Y. Chen, “Retransformer: Reram-based processing-in-memory architecture for transformer acceleration,” in Proceedings of the 39th International Conference on Computer-Aided Design, 2020, pp. 1–9.
  26. Z. Yan, J. Chen, R. Hu, T. Huang, Y. Chen, and S. Wen, “Training memristor-based multilayer neuromorphic networks with sgd, momentum and adaptive learning rates,” Neural Networks, vol. 128, pp. 142–149, 2020.
  27. M. Prezioso, F. Merrikh-Bayat, B. Hoskins, G. C. Adam, K. K. Likharev, and D. B. Strukov, “Training and operation of an integrated neuromorphic network based on metal-oxide memristors,” Nature, vol. 521, no. 7550, pp. 61–64, 2015.
  28. I. Kataeva, F. Merrikh-Bayat, E. Zamanidoost, and D. Strukov, “Efficient training algorithms for neural networks based on memristive crossbar circuits,” in 2015 International Joint Conference on Neural Networks (IJCNN).   IEEE, 2015, pp. 1–8.
  29. B. Li, Y. Wang, Y. Wang, Y. Chen, and H. Yang, “Training itself: Mixed-signal training acceleration for memristor-based neural network,” in 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC).   IEEE, 2014, pp. 361–366.
  30. M. Cheng, L. Xia, Z. Zhu, Y. Cai, Y. Xie, Y. Wang, and H. Yang, “Time: A training-in-memory architecture for memristor-based deep neural networks,” in 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).   IEEE, 2017, pp. 1–6.
  31. S. Duan, X. Hu, Z. Dong, L. Wang, and P. Mazumder, “Memristor-based cellular nonlinear/neural network: design, analysis, and applications,” IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 6, pp. 1202–1213, 2014.
  32. J. Chen, Y. Wu, Y. Yang, S. Wen, K. Shi, A. Bermak, and T. Huang, “An efficient memristor-based circuit implementation of squeeze-and-excitation fully convolutional neural networks,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 4, pp. 1779–1790, 2021.
  33. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in neural information processing systems, 2017, pp. 5998–6008.
  34. H. Veluri, U. Chand, Y. Li, B. Tang, and A. V.-Y. Thean, “A low-power dnn accelerator enabled by a novel staircase rram array,” IEEE Transactions on Neural Networks and Learning Systems, 2021.
  35. B. Lyu, S. Wen, K. Shi, and T. Huang, “Multiobjective reinforcement learning-based neural architecture search for efficient portrait parsing,” IEEE Transactions on Cybernetics, 2021.
  36. B. Lyu, H. Yuan, L. Lu, and Y. Zhang, “Resource-constrained neural architecture search on edge devices,” IEEE Transactions on Network Science and Engineering, 2021.
  37. S. Wen, H. Wei, Z. Yan, Z. Guo, Y. Yang, T. Huang, and Y. Chen, “Memristor-based design of sparse compact convolutional neural network,” IEEE Transactions on Network Science and Engineering, vol. 7, no. 3, pp. 1431–1440, 2019.
  38. S. Wen, J. Chen, Y. Wu, Z. Yan, Y. Cao, Y. Yang, and T. Huang, “Ckfo: Convolution kernel first operated algorithm with applications in memristor-based convolutional neural network,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020.
  39. T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” arXiv preprint arXiv:1609.02907, 2016.
  40. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
  41. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
  42. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  43. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
  44. W. L. Hamilton, R. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017, pp. 1025–1035.
  45. P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio, “Graph attention networks,” arXiv preprint arXiv:1710.10903, 2017.
  46. M. Hu, J. P. Strachan, Z. Li, E. M. Grafals, N. Davila, C. Graves, S. Lam, N. Ge, J. J. Yang, and R. S. Williams, “Dot-product engine for neuromorphic computing: Programming 1t1m crossbar to accelerate matrix-vector multiplication,” in Proceedings of the 53rd Annual Design Automation Conference, 2016.
  47. B. Lyu, M. Hamdi, Y. Yang, Y. Cao, Z. Yan, K. Li, S. Wen, and T. Huang, “Efficient spectral graph convolutional network deployment on memristive crossbars,” IEEE Transactions on Emerging Topics in Computational Intelligence, 2022.
  48. Z. Gábor, Z. Kalmár, and C. Szepesvári, “Multi-criteria reinforcement learning,” in Proceedings of the 15th International Conference on Machine Learning, 1998, pp. 197–205.
  49. S. Mannor and N. Shimkin, “A geometric approach to multi-criterion reinforcement learning,” Journal of Machine Learning Research, pp. 325–360, 2004.
  50. V. K. Moffaert, M. M. Drugan, and A. Nowé, “Scalarized multi-objective reinforcement learning: Novel design techniques,” ADPRL, pp. 191–199, 2013.
  51. L. C. Blum and J.-L. Reymond, “970 million druglike small molecules for virtual screening in the chemical universe database gdb-13,” Journal of the American Chemical Society, vol. 131, no. 25, pp. 8732–8733, 2009.
  52. M. Rupp, A. Tkatchenko, K.-R. Müller, and O. A. Von Lilienfeld, “Fast and accurate modeling of molecular atomization energies with machine learning,” Physical review letters, vol. 108, no. 5, p. 058301, 2012.
  53. C. Song, B. Liu, W. Wen, H. Li, and Y. Chen, “A quantization-aware regularized learning method in multilevel memristor-based neuromorphic computing system,” in IEEE 6th Non-Volatile Memory Systems and Applications Symposium, NVMSA 2017, Hsinchu, Taiwan, August 16-18, 2017.   IEEE, 2017, pp. 1–6.
  54. B. Liu, H. Li, Y. Chen, X. Li, Q. Wu, and T. Huang, “Vortex: variation-aware training for memristor x-bar,” in Proceedings of the 52nd Annual Design Automation Conference, San Francisco, CA, USA, June 7-11, 2015.   ACM, 2015, pp. 15:1–15:6.
  55. S. Jin, S. Pei, and Y. Wang, “A variation tolerant scheme for memristor crossbar based neural network designs via two-phase weight mapping and memristor programming,” Future Gener. Comput. Syst., vol. 106, pp. 270–276, 2020.
  56. D. Gao, G. L. Zhang, X. Yin, B. Li, U. Schlichtmann, and C. Zhuo, “Reliable memristor-based neuromorphic design using variation- and defect-aware training,” in IEEE/ACM International Conference On Computer Aided Design, ICCAD 2021, Munich, Germany, November 1-4, 2021.   IEEE, 2021, pp. 1–9.
Citations (2)

Summary

We haven't generated a summary for this paper yet.