Fully Spiking Actor Network with Intra-layer Connections for Reinforcement Learning (2401.05444v1)
Abstract: With the help of special neuromorphic hardware, spiking neural networks (SNNs) are expected to realize AI with less energy consumption. It provides a promising energy-efficient way for realistic control tasks by combining SNNs with deep reinforcement learning (DRL). In this paper, we focus on the task where the agent needs to learn multi-dimensional deterministic policies to control, which is very common in real scenarios. Recently, the surrogate gradient method has been utilized for training multi-layer SNNs, which allows SNNs to achieve comparable performance with the corresponding deep networks in this task. Most existing spike-based RL methods take the firing rate as the output of SNNs, and convert it to represent continuous action space (i.e., the deterministic policy) through a fully-connected (FC) layer. However, the decimal characteristic of the firing rate brings the floating-point matrix operations to the FC layer, making the whole SNN unable to deploy on the neuromorphic hardware directly. To develop a fully spiking actor network without any floating-point matrix operations, we draw inspiration from the non-spiking interneurons found in insects and employ the membrane voltage of the non-spiking neurons to represent the action. Before the non-spiking neurons, multiple population neurons are introduced to decode different dimensions of actions. Since each population is used to decode a dimension of action, we argue that the neurons in each population should be connected in time domain and space domain. Hence, the intra-layer connections are used in output populations to enhance the representation capacity. Finally, we propose a fully spiking actor network with intra-layer connections (ILC-SAN).
- D. Kuzum, S. Yu, and H. P. Wong, “Synaptic electronics: materials, devices and applications,” Nanotechnology, vol. 24, no. 38, p. 382001, 2013.
- G. Indiveri and S.-C. Liu, “Memory and information processing in neuromorphic systems,” Proceedings of the IEEE, vol. 103, no. 8, pp. 1379–1397, 2015.
- K. Roy, A. Jaiswal, and P. Panda, “Towards spike-based machine intelligence with neuromorphic computing,” Nature, vol. 575, no. 7784, pp. 607–617, 2019.
- P. A. Merolla, J. V. Arthur, R. Alvarez-Icaza, A. S. Cassidy, J. Sawada, F. Akopyan, B. L. Jackson, N. Imam, C. Guo, Y. Nakamura et al., “A million spiking-neuron integrated circuit with a scalable communication network and interface,” Science, vol. 345, no. 6197, pp. 668–673, 2014.
- M. Davies, N. Srinivasa, T.-H. Lin, G. Chinya, Y. Cao, S. H. Choday, G. Dimou, P. Joshi, N. Imam, S. Jain et al., “Loihi: A neuromorphic manycore processor with on-chip learning,” Ieee Micro, vol. 38, no. 1, pp. 82–99, 2018.
- S. B. Furber, F. Galluppi, S. Temple, and L. A. Plana, “The spinnaker project,” Proceedings of the IEEE, vol. 102, no. 5, pp. 652–665, 2014.
- W. Maass, “Networks of spiking neurons: the third generation of neural network models,” Neural networks, vol. 10, no. 9, pp. 1659–1671, 1997.
- W. Fang, Z. Yu, Y. Chen, T. Masquelier, T. Huang, and Y. Tian, “Incorporating learnable membrane time constant to enhance learning of spiking neural networks,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 2661–2671.
- W. Fang, Z. Yu, Y. Chen, T. Huang, T. Masquelier, and Y. Tian, “Deep residual learning in spiking neural networks,” in Thirty-Fifth Conference on Neural Information Processing Systems, 2021.
- A. Mahadevuni and P. Li, “Navigating mobile robots to target in near shortest time using reinforcement learning with spiking neural networks,” in 2017 International Joint Conference on Neural Networks (IJCNN). IEEE, 2017, pp. 2243–2250.
- Z. Bing, C. Meschede, K. Huang, G. Chen, F. Rohrbein, M. Akl, and A. Knoll, “End to end learning of spiking neural network based on r-stdp for a lane keeping vehicle,” in 2018 IEEE international conference on robotics and automation (ICRA). IEEE, 2018, pp. 4725–4732.
- Z. Bing, C. Meschede, F. Röhrbein, K. Huang, and A. C. Knoll, “A survey of robotics control based on learning-inspired spiking neural networks,” Frontiers in neurorobotics, vol. 12, p. 35, 2018.
- V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Human-level control through deep reinforcement learning,” nature, vol. 518, no. 7540, pp. 529–533, 2015.
- D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel et al., “A general reinforcement learning algorithm that masters chess, shogi, and go through self-play,” Science, vol. 362, no. 6419, pp. 1140–1144, 2018.
- N. Frémaux, H. Sprekeler, and W. Gerstner, “Reinforcement learning using a continuous time actor-critic framework with spiking neurons,” PLoS computational biology, vol. 9, no. 4, p. e1003024, 2013.
- N. Frémaux and W. Gerstner, “Neuromodulated spike-timing-dependent plasticity, and theory of three-factor learning rules,” Frontiers in neural circuits, vol. 9, p. 85, 2016.
- G. Bellec, F. Scherr, A. Subramoney, E. Hajek, D. Salaj, R. Legenstein, and W. Maass, “A solution to the learning dilemma for recurrent networks of spiking neurons,” Nature communications, vol. 11, no. 1, pp. 1–15, 2020.
- J. H. Lee, T. Delbruck, and M. Pfeiffer, “Training deep spiking neural networks using backpropagation,” Frontiers in neuroscience, vol. 10, p. 508, 2016.
- T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” in 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, Y. Bengio and Y. LeCun, Eds., 2016. [Online]. Available: http://arxiv.org/abs/1509.02971
- S. Fujimoto, H. Hoof, and D. Meger, “Addressing function approximation error in actor-critic methods,” in International conference on machine learning. PMLR, 2018, pp. 1587–1596.
- T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in International conference on machine learning. PMLR, 2018, pp. 1861–1870.
- G. Tang, N. Kumar, R. Yoo, and K. Michmizos, “Deep reinforcement learning with population-coded spiking neural network for continuous control,” in Conference on Robot Learning. PMLR, 2021, pp. 2016–2029.
- D. Zhang, T. Zhang, S. Jia, and B. Xu, “Multiscale dynamic coding improved spiking actor network for reinforcement learning,” in Thirty-sixth AAAI conference on artificial intelligence, 2022.
- S. S. Bidaye, T. Bockemühl, and A. Büschges, “Six-legged walking in insects: how cpgs, peripheral feedback, and descending signals generate coordinated and adaptive motor rhythms,” Journal of neurophysiology, vol. 119, no. 2, pp. 459–475, 2018.
- J. Satel, “Mechanisms of inhibition of return: Brain, behavior, and computational modeling,” 2013.
- M. Evanusa, Y. Sandamirskaya et al., “Event-based attention and tracking on neuromorphic hardware,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019, pp. 0–0.
- W. Gerstner, M. Lehmann, V. Liakoni, D. Corneil, and J. Brea, “Eligibility traces and plasticity on behavioral time scales: experimental support of neohebbian three-factor learning rules,” Frontiers in neural circuits, vol. 12, p. 53, 2018.
- D. Vlasov, R. Rybka, A. Sboev, A. Serenko, A. Minnekhanov, and V. Demin, “Reinforcement learning in a spiking neural network with memristive plasticity,” in 2022 6th Scientific School Dynamics of Complex Networks and their Applications (DCNA). IEEE, 2022, pp. 300–302.
- B. Rueckauer, I.-A. Lungu, Y. Hu, M. Pfeiffer, and S.-C. Liu, “Conversion of continuous-valued deep networks to efficient event-driven networks for image classification,” Frontiers in neuroscience, vol. 11, p. 682, 2017.
- D. Patel, H. Hazan, D. J. Saunders, H. T. Siegelmann, and R. Kozma, “Improved robustness of reinforcement learning policies upon conversion to spiking neuronal network platforms applied to atari breakout game,” Neural Networks, vol. 120, pp. 108–115, 2019.
- W. Tan, D. Patel, and R. Kozma, “Strategy and benchmark for converting deep q-networks to event-driven spiking neural networks,” arXiv preprint arXiv:2009.14456, 2020.
- G. Tang, N. Kumar, and K. P. Michmizos, “Reinforcement co-learning of deep and spiking neural networks for energy-efficient mapless navigation with neuromorphic hardware,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2020, pp. 6090–6097.
- G. Liu, W. Deng, X. Xie, L. Huang, and H. Tang, “Human-level control through directly trained deep spiking q𝑞qitalic_q-networks,” IEEE Transactions on Cybernetics, 2022.
- Y. Sun, Y. Zeng, and Y. Li, “Solving the spike feature information vanishing problem in spiking deep q network with potential based normalization,” arXiv preprint arXiv:2206.03654, 2022.
- M. Akl, D. Ergene, F. Walter, and A. Knoll, “Toward robust and scalable deep spiking reinforcement learning,” Frontiers in Neurorobotics, vol. 16, p. 314, 2023.
- L. Qin, R. Yan, and H. Tang, “A low latency adaptive coding spiking framework for deep reinforcement learning,” arXiv preprint arXiv:2211.11760, 2022.
- B. Cramer, Y. Stradmann, J. Schemmel, and F. Zenke, “The heidelberg spiking data sets for the systematic evaluation of spiking neural networks,” IEEE Transactions on Neural Networks and Learning Systems, 2020.
- C. Lee, A. K. Kosta, A. Z. Zhu, K. Chaney, K. Daniilidis, and K. Roy, “Spike-flownet: event-based optical flow estimation with energy-efficient hybrid neural networks,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIX 16. Springer, 2020, pp. 366–382.
- S. R. Kheradpisheh, M. Ganjtabesh, S. J. Thorpe, and T. Masquelier, “Stdp-based spiking deep convolutional neural networks for object recognition,” Neural Networks, vol. 99, pp. 56–67, 2018.
- W. Fang, Y. Chen, J. Ding, D. Chen, Z. Yu, H. Zhou, Y. Tian, and other contributors, “Spikingjelly,” https://github.com/fangwei123456/spikingjelly, 2020, accessed: 2021-12-01.
- I. Polykretis, G. Tang, and K. P. Michmizos, “An astrocyte-modulated neuromorphic central pattern generator for hexapod robot locomotion on intel’s loihi,” in International Conference on Neuromorphic Systems 2020, 2020, pp. 1–9.
- G. Bellec, D. Salaj, A. Subramoney, R. Legenstein, and W. Maass, “Long short-term memory and learning-to-learn in networks of spiking neurons,” Advances in neural information processing systems, vol. 31, 2018.
- B. Yin, F. Corradi, and S. M. Bohté, “Accurate and efficient time-domain classification with adaptive spiking recurrent neural networks,” Nature Machine Intelligence, vol. 3, no. 10, pp. 905–913, 2021.
- A. Rao, P. Plank, A. Wild, and W. Maass, “A long short-term memory for ai applications in spike-based neuromorphic hardware,” Nature Machine Intelligence, vol. 4, no. 5, pp. 467–479, 2022.
- F. Ratliff, H. K. Hartline, and D. Lange, “The dynamics of lateral inhibition in the compound eye of limulus. i,” Studies on Excitation and Inhibition in the Retina: A Collection of Papers from the Laboratories of H. Keffer Hartline, p. 463, 1974.
- P. U. Diehl and M. Cook, “Unsupervised learning of digit recognition using spike-timing-dependent plasticity,” Frontiers in computational neuroscience, vol. 9, p. 99, 2015.
- X. Cheng, Y. Hao, J. Xu, and B. Xu, “Lisnn: Improving spiking neural networks with lateral interactions for robust object recognition.” in IJCAI, 2020, pp. 1519–1525.
- W. Zhang and P. Li, “Spike-train level backpropagation for training deep recurrent spiking neural networks,” Advances in neural information processing systems, vol. 32, 2019.
- N. Brunel and M. C. Van Rossum, “Lapicque’s 1907 paper: from frogs to integrate-and-fire,” Biological cybernetics, vol. 97, no. 5, pp. 337–339, 2007.
- K. Ota, D. K. Jha, and A. Kanezaki, “Training larger networks for deep reinforcement learning,” arXiv preprint arXiv:2102.07920, 2021.
- A. Kumar, R. Agarwal, D. Ghosh, and S. Levine, “Implicit under-parameterization inhibits data-efficient deep reinforcement learning,” arXiv preprint arXiv:2010.14498, 2020.
- Y. Hu, H. Tang, and G. Pan, “Spiking deep residual networks,” IEEE Transactions on Neural Networks and Learning Systems, 2018.
- N. Qiao, H. Mostafa, F. Corradi, M. Osswald, F. Stefanini, D. Sumislawska, and G. Indiveri, “A reconfigurable on-line learning spiking neuromorphic processor comprising 256 neurons and 128k synapses,” Frontiers in neuroscience, vol. 9, p. 141, 2015.