Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Biologically-Plausible Topology Improved Spiking Actor Network for Efficient Deep Reinforcement Learning (2403.20163v1)

Published 29 Mar 2024 in cs.NE and q-bio.NC

Abstract: The success of Deep Reinforcement Learning (DRL) is largely attributed to utilizing Artificial Neural Networks (ANNs) as function approximators. Recent advances in neuroscience have unveiled that the human brain achieves efficient reward-based learning, at least by integrating spiking neurons with spatial-temporal dynamics and network topologies with biologically-plausible connectivity patterns. This integration process allows spiking neurons to efficiently combine information across and within layers via nonlinear dendritic trees and lateral interactions. The fusion of these two topologies enhances the network's information-processing ability, crucial for grasping intricate perceptions and guiding decision-making procedures. However, ANNs and brain networks differ significantly. ANNs lack intricate dynamical neurons and only feature inter-layer connections, typically achieved by direct linear summation, without intra-layer connections. This limitation leads to constrained network expressivity. To address this, we propose a novel alternative for function approximator, the Biologically-Plausible Topology improved Spiking Actor Network (BPT-SAN), tailored for efficient decision-making in DRL. The BPT-SAN incorporates spiking neurons with intricate spatial-temporal dynamics and introduces intra-layer connections, enhancing spatial-temporal state representation and facilitating more precise biological simulations. Diverging from the conventional direct linear weighted sum, the BPT-SAN models the local nonlinearities of dendritic trees within the inter-layer connections. For the intra-layer connections, the BPT-SAN introduces lateral interactions between adjacent neurons, integrating them into the membrane potential formula to ensure accurate spike firing.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47: 253–279.
  2. Openai gym. arXiv preprint arXiv:1606.01540.
  3. Deep Reinforcement Learning with Spiking Q-learning. arXiv preprint arXiv:2201.09754.
  4. LISNN: Improving spiking neural networks with lateral interactions for robust object recognition. In IJCAI, 1519–1525. Yokohama.
  5. Theoretical neuroscience: computational and mathematical modeling of neural systems. MIT press.
  6. Neuronal circuits of the neocortex. Annu. Rev. Neurosci., 27: 419–451.
  7. Benchmarking deep reinforcement learning for continuous control. In International conference on machine learning, 1329–1338. PMLR.
  8. Addressing function approximation error in actor-critic methods. In International Conference on Machine Learning, 1587–1596. PMLR.
  9. Deep sparse rectifier neural networks. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, 315–323. JMLR Workshop and Conference Proceedings.
  10. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, 1861–1870. PMLR.
  11. Deep reinforcement learning that matters. In Proceedings of the AAAI conference on artificial intelligence, volume 32.
  12. A deep reinforcement learning framework for the financial portfolio management problem. arXiv preprint arXiv:1706.10059.
  13. Reinforcement learning: A survey. Journal of artificial intelligence research, 4: 237–285.
  14. Nonlinear interactions in a dendritic tree: localization, timing, and role in information processing. Proceedings of the National Academy of Sciences, 80(9): 2799–2802.
  15. Continuous control with deep reinforcement learning. In ICLR (Poster).
  16. Human-Level Control through Directly-Trained Deep Spiking Q-Networks. arXiv preprint arXiv:2201.07211.
  17. Compartmentalized dendritic plasticity and input feature storage in neurons. Nature, 452(7186): 436–441.
  18. Maass, W. 1997. Networks of spiking neurons: the third generation of neural network models. Neural networks, 10(9): 1659–1671.
  19. Cortical rewiring and information storage. Nature, 431(7010): 782–788.
  20. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, 1928–1937. PMLR.
  21. Human-level control through deep reinforcement learning. nature, 518(7540): 529–533.
  22. Improved robustness of reinforcement learning policies upon conversion to spiking neuronal network platforms applied to Atari Breakout game. Neural Networks, 120: 108–115.
  23. Deep learning with spiking neurons: Opportunities and challenges. Frontiers in neuroscience, 12: 774.
  24. The dynamics of lateral inhibition in the compound eye of Limulus. I. The functional organization of the compound eye, 399–424.
  25. Learning representations by back-propagating errors. nature, 323(6088): 533–536.
  26. Trust region policy optimization. In International conference on machine learning, 1889–1897. PMLR.
  27. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
  28. Schultz, W. 1998. Predictive reward signal of dopamine neurons. Journal of neurophysiology.
  29. Parameter-exploring policy gradients. Neural Networks, 23(4): 551–559.
  30. Mastering the game of Go with deep neural networks and tree search. nature, 529(7587): 484–489.
  31. Dendrites. Oxford University Press.
  32. Reinforcement learning: An introduction. MIT press.
  33. Strategy and Benchmark for Converting Deep Q-Networks to Event-Driven Spiking Neural Networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, 9816–9824.
  34. Reinforcement co-learning of deep and spiking neural networks for energy-efficient mapless navigation with neuromorphic hardware. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 6090–6097. IEEE.
  35. Deep Reinforcement Learning with Population-Coded Spiking Neural Network for Continuous Control. In Conference on Robot Learning, 2016–2029. PMLR.
  36. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, 5026–5033. IEEE.
  37. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, volume 30.
  38. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782): 350–354.
  39. Attention-free spikformer: Mixing spike sequences with simple linear transforms. arXiv preprint arXiv:2308.02557.
  40. Complex dynamic neurons improved spiking transformer network for efficient automatic speech recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, 102–109.
  41. Dueling network architectures for deep reinforcement learning. In International conference on machine learning, 1995–2003. PMLR.
  42. Q-learning. Machine learning, 8(3): 279–292.
  43. Improved expressivity through dendritic neural networks. Advances in neural information processing systems, 31.
  44. Yuste, R. 2011. Dendritic spines and distributed circuits. Neuron, 71(5): 772–781.
  45. Superspike: Supervised learning in multilayer spiking neural networks. Neural computation, 30(6): 1514–1541.
  46. Recent Advances and New Frontiers in Spiking Neural Networks. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23-29 July 2022, 5670–5677.
  47. Tuning Synaptic Connections instead of Weights by Genetic Algorithm in Spiking Policy Network. arXiv preprint arXiv:2301.10292.
  48. Multi-Sacle Dynamic Coding Improved Spiking Actor Network for Reinforcement Learning. In Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022 Virtual Event, February 22 - March 1, 2022, 59–67. AAAI Press.
  49. Population-coding and Dynamic-neurons improved Spiking Actor Network for Reinforcement Learning. ArXiv preprint arXiv:2106.07854.
  50. ODE-based Recurrent Model-free Reinforcement Learning for POMDPs. Advances in Neural Information Processing Systems, 25: 27159–27170.
Citations (1)

Summary

We haven't generated a summary for this paper yet.