GRSN: Gated Recurrent Spiking Neurons for POMDPs and MARL (2404.15597v1)
Abstract: Spiking neural networks (SNNs) are widely applied in various fields due to their energy-efficient and fast-inference capabilities. Applying SNNs to reinforcement learning (RL) can significantly reduce the computational resource requirements for agents and improve the algorithm's performance under resource-constrained conditions. However, in current spiking reinforcement learning (SRL) algorithms, the simulation results of multiple time steps can only correspond to a single-step decision in RL. This is quite different from the real temporal dynamics in the brain and also fails to fully exploit the capacity of SNNs to process temporal data. In order to address this temporal mismatch issue and further take advantage of the inherent temporal dynamics of spiking neurons, we propose a novel temporal alignment paradigm (TAP) that leverages the single-step update of spiking neurons to accumulate historical state information in RL and introduces gated units to enhance the memory capacity of spiking neurons. Experimental results show that our method can solve partially observable Markov decision processes (POMDPs) and multi-agent cooperation problems with similar performance as recurrent neural networks (RNNs) but with about 50% power consumption.
- Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13(5):834–846, 1983.
- Richard Bellman. Dynamic programming. Science, 153(3731):34–37, 1966.
- Error-backpropagation in temporally encoded networks of spiking neurons. Neurocomputing, 48:17–37, 2002.
- Openai gym, 2016.
- Deep reinforcement learning with spiking q-learning. CoRR, abs/2201.09754, 2022.
- Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR, abs/1412.3555, 2014.
- Spikingjelly: An open-source machine learning infrastructure platform for spike-based intelligence. Science Advances, 9(40):eadi1480, 2023.
- Functional requirements for reward-modulated spike-timing-dependent plasticity. Journal of Neuroscience, 30:13326–13337, 2010.
- Addressing function approximation error in actor-critic methods. In Proceedings of the 35th International Conference on Machine Learning, pages 1582–1591, 2018.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the 35th International Conference on Machine Learning, pages 1856–1865, 2018.
- Learning both weights and connections for efficient neural networks. CoRR, abs/1506.02626, 2015.
- Variational recurrent models for solving partially observable control tasks. In Proceedings of the 8th International Conference on Learning Representations, 2020.
- Rethinking the implementation tricks and monotonicity constraint in cooperative multi-agent reinforcement learning. CoRR, abs/2102.03479, 2021.
- Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations, 2015.
- Training deep spiking neural networks. CoRR, abs/2006.04436, 2020.
- Learnable surrogate gradient for direct training spiking neural networks. In Proceedings of the 32nd International Joint Conference on Artificial Intelligence, pages 3002–3010, 2023.
- Human-level control through directly trained deep spiking q𝑞qitalic_q -networks. IEEE Transactions on Cybernetics, pages 1–12, 2022.
- Gated spiking neural P systems for time series forecasting. IEEE Transactions on Neural Networks and Learning Systems, 34:6227–6236, 2023.
- Long short-term memory spiking networks and their applications. In Proceedings of the International Conference on Neuromorphic Systems, pages 3:1–3:9, 2020.
- Surrogate gradient learning in spiking neural networks: Bringing the power of gradient-based optimization to spiking neural networks. IEEE Signal Processing Magazine, 36:51–63, 2019.
- Recurrent model-free RL can be a strong baseline for many pomdps. In Proceedings of the 39th International Conference on Machine Learning, pages 16691–16723, 2022.
- Improved robustness of reinforcement learning policies upon conversion to spiking neuronal network platforms applied to atari breakout game. Neural Networks, 120:108–115, 2019.
- Towards artificial general intelligence with hybrid tianjic chip architecture. Nature, 572:106–111, 2019.
- Spiking neural networks with improved inherent recurrence dynamics for sequential learning. In Proceedings of the 36th AAAI Conference on Artificial Intelligence, pages 8001–8008, 2022.
- Attention-based deep spiking neural networks for temporal credit assignment problems. IEEE Transactions on Neural Networks and Learning Systems, pages 1–11, 2023.
- A low latency adaptive coding spike framework for deep reinforcement learning. In Proceedings of the 32nd International Joint Conference on Artificial Intelligence, pages 3049–3057, 2023.
- A long short-term memory for AI applications in spike-based neuromorphic hardware. Nature Machine Intelligence, 4:467–479, 2022.
- QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In Proceedings of the 35th International Conference on Machine Learning, pages 4292–4301, 2018.
- DIET-SNN: A low-latency spiking neural network with direct input encoding and leakage and threshold optimization. IEEE Transactions on Neural Networks and Learning Systems, 34(6):3174–3182, 2023.
- Towards spike-based machine intelligence with neuromorphic computing. Nature, 575:607–617, 2019.
- The starcraft multi-agent challenge. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, pages 2186–2188, 2019.
- H Sebastian Seung. Learning in spiking neural networks by reinforcement of stochastic synaptic transmission. Neuron, 40:1063–1073, 2003.
- Solving the spike feature information vanishing problem in spiking deep q network with potential based normalization. Frontiers in Neuroscience, 16, 2022.
- Reinforcement learning: An introduction. MIT press, 2018.
- Strategy and benchmark for converting deep q-networks to event-driven spiking neural networks. In Proceedings of the 35th AAAI Conference on Artificial Intelligence, pages 9816–9824, 2021.
- Reinforcement co-learning of deep and spiking neural networks for energy-efficient mapless navigation with neuromorphic hardware. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 6090–6097, 2020.
- Deep reinforcement learning with population-coded spiking neural network for continuous control. In Proceedings of the 2020 Conference on Robot Learning, pages 2016–2029, 2020.
- Reinforcement learning in populations of spiking neurons. Nature Neuroscience, 3:250–252, 2009.
- Adaptive smoothing gradient learning for spiking neural networks. In Proceedings of the 40th International Conference on Machine Learning, pages 35798–35816, 2023.
- Ronald J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8:229–256, 1992.
- Spatio-temporal backpropagation for training high-performance spiking neural networks. Frontiers in Neuroscience, 12, 2018.
- A tandem learning rule for effective training and rapid inference of deep spiking neural networks. IEEE Transactions on Neural Networks and Learning Systems, 34(1):446–460, 2023.
- Effective and efficient computation with multiple-timescale spiking recurrent neural networks. In Proceedings of the International Conference on Neuromorphic Systems, pages 1:1–1:8, 2020.
- Distilling neuron spike with high temperature in reinforcement learning agents. CoRR, abs/2108.10078, 2021.
- Multiscale dynamic coding improved spiking actor network for reinforcement learning. In Proceedings of the 36th AAAI Conference on Artificial Intelligence, pages 59–67, 2022.