Bidirectional Progressive Neural Networks with Episodic Return Progress for Emergent Task Sequencing and Robotic Skill Transfer (2403.04001v1)
Abstract: Human brain and behavior provide a rich venue that can inspire novel control and learning methods for robotics. In an attempt to exemplify such a development by inspiring how humans acquire knowledge and transfer skills among tasks, we introduce a novel multi-task reinforcement learning framework named Episodic Return Progress with Bidirectional Progressive Neural Networks (ERP-BPNN). The proposed ERP-BPNN model (1) learns in a human-like interleaved manner by (2) autonomous task switching based on a novel intrinsic motivation signal and, in contrast to existing methods, (3) allows bidirectional skill transfer among tasks. ERP-BPNN is a general architecture applicable to several multi-task learning settings; in this paper, we present the details of its neural architecture and show its ability to enable effective learning and skill transfer among morphologically different robots in a reaching task. The developed Bidirectional Progressive Neural Network (BPNN) architecture enables bidirectional skill transfer without requiring incremental training and seamlessly integrates with online task arbitration. The task arbitration mechanism developed is based on soft Episodic Return progress (ERP), a novel intrinsic motivation (IM) signal. To evaluate our method, we use quantifiable robotics metrics such as 'expected distance to goal' and 'path straightness' in addition to the usual reward-based measure of episodic return common in reinforcement learning. With simulation experiments, we show that ERP-BPNN achieves faster cumulative convergence and improves performance in all metrics considered among morphologically different robots compared to the baselines.
- E. Oztop and E. Ugur, “Lifelong robot learning,” in Encyclopedia of Robotics, M. H. Ang, O. Khatib, and B. Siciliano, Eds. Springer Berlin Heidelberg, pp. 1–12.
- G. I. Parisi, R. Kemker, J. L. Part, C. Kanan, and S. Wermter, “Continual lifelong learning with neural networks: A review,” vol. 113, pp. 54–71.
- S. Thrun and T. M. Mitchell, “Lifelong robot learning,” Robotics and autonomous systems, vol. 15, no. 1-2, pp. 25–46, 1995.
- H. Say and E. Oztop, “A model for cognitively valid lifelong learning,” in IEEE International Conference on Robotics and Biomimetics (ROBIO), 2023.
- C.-H. Lin, M.-C. Chiang, B. J. Knowlton, M. Iacoboni, P. Udompholkul, and A. D. Wu, “Interleaved practice enhances skill learning and the functional connectivity of fronto-parietal networks,” Human brain mapping, vol. 34, no. 7, pp. 1542–1558, 2013.
- J. Samani and S. C. Pan, “Interleaved practice enhances memory and problem-solving ability in undergraduate physics,” npj Science of Learning, vol. 6, no. 1, p. 32, 2021.
- J. Park, K. Varma, and S. Varma, “The role of executive function abilities in interleaved vs. blocked learning of science concepts,” Frontiers in Psychology, vol. 14, 2023.
- L. Nadel, A. Hupbach, R. Gomez, and K. Newman-Smith, “Memory formation, consolidation and transformation,” Neuroscience & Biobehavioral Reviews, vol. 36, no. 7, pp. 1640–1645, 2012.
- N. J. Cepeda, E. Vul, D. Rohrer, J. T. Wixted, and H. Pashler, “Spacing effects in learning: A temporal ridgeline of optimal retention,” Psychological science, vol. 19, no. 11, pp. 1095–1102, 2008.
- P.-Y. Oudeyer, F. Kaplan, and V. V. Hafner, “Intrinsic motivation systems for autonomous mental development,” IEEE Transactions on Evolutionary Computation, vol. 11, no. 2, pp. 265–286, 2007.
- C. Colas, P. Fournier, M. Chetouani, O. Sigaud, and P.-Y. Oudeyer, “CURIOUS: Intrinsically motivated modular multi-goal reinforcement learning,” in Proceedings of the 36th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, K. Chaudhuri and R. Salakhutdinov, Eds., vol. 97. PMLR, 09–15 Jun 2019, pp. 1331–1340.
- S. Forestier and P.-Y. Oudeyer, “Curiosity-driven development of tool use precursors: a computational model,” in 38th annual conference of the cognitive science society (cogsci 2016), 2016, pp. 1859–1864.
- Y. Gatsoulis and T. M. McGinnity, “Intrinsically motivated learning systems based on biologically-inspired novelty detection,” Robotics and Autonomous Systems, vol. 68, pp. 12–20, 2015.
- J. Achiam and S. Sastry, “Surprise-based intrinsic motivation for deep reinforcement learning,” arXiv preprint arXiv:1703.01732, 2017.
- N. Vithayathil Varghese and Q. H. Mahmoud, “A survey of multi-task deep reinforcement learning,” Electronics, vol. 9, no. 9, 2020.
- Z. Xu, K. Wu, Z. Che, J. Tang, and J. Ye, “Knowledge transfer in multi-task deep reinforcement learning for continuous control,” in Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33. Curran Associates, Inc., 2020, pp. 15 146–15 155.
- Y. Teh, V. Bapst, W. M. Czarnecki, J. Quan, J. Kirkpatrick, R. Hadsell, N. Heess, and R. Pascanu, “Distral: Robust multitask reinforcement learning,” Advances in neural information processing systems, vol. 30, 2017.
- J. Zhang, J. T. Springenberg, J. Boedecker, and W. Burgard, “Deep reinforcement learning with successor features for navigation across similar environments,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017, pp. 2371–2378.
- H. Yin and S. J. Pan, “Knowledge transfer for deep reinforcement learning with hierarchical experience replay,” in Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, ser. AAAI’17. AAAI Press, 2017, p. 1640–1646.
- L. T. Liu, U. Dogan, and K. Hofmann, “Decoding multitask dqn in the world of minecraft,” in The 13th European Workshop on Reinforcement Learning (EWRL) 2016, December 2016, also presented at the 11th Women in Machine Learning Workshop and the Deep Reinforcement Learning Workshop at NeurIPS 2016.
- N. Roy, I. Posner, T. Barfoot, P. Beaudoin, Y. Bengio, J. Bohg, O. Brock, I. Depatie, D. Fox, D. Koditschek, T. Lozano-Perez, V. Mansinghka, C. Pal, B. Richards, D. Sadigh, S. Schaal, G. Sukhatme, D. Therien, M. Toussaint, and M. Van de Panne, “From Machine Learning to Robotics: Challenges and Opportunities for Embodied Intelligence,” Oct. 2021, arXiv:2110.15245 [cs].
- A. A. Rusu, N. C. Rabinowitz, G. Desjardins, and others, “Progressive neural networks.”
- A. A. Rusu, M. Vecerík, T. Rothörl, N. Heess, R. Pascanu, and R. Hadsell, “Sim-to-real robot learning from pixels with progressive nets,” in 1st Annual Conference on Robot Learning, CoRL 2017, Mountain View, California, USA, November 13-15, 2017, Proceedings, ser. Proceedings of Machine Learning Research, vol. 78. PMLR, 2017, pp. 262–270.
- D. Hassabis, D. Kumaran, C. Summerfield, and M. Botvinick, “Neuroscience-Inspired Artificial Intelligence,” Neuron, vol. 95, no. 2, pp. 245–258, July 2017.
- Y. Niv, “Reinforcement learning in the brain,” Journal of Mathematical Psychology, vol. 53, no. 3, p. 139–154, 2009.
- K. Khetarpal, M. Riemer, I. Rish, and D. Precup, “Towards continual reinforcement learning: A review and perspectives,” vol. 75, pp. 1401–1476.
- F. Brady, “The contextual interference effect and sport skills,” Perceptual and motor skills, vol. 106, no. 2, pp. 461–472, 2008.
- E. S. Cross, P. J. Schmitt, and S. T. Grafton, “Neural substrates of contextual interference during motor learning support a model of active preparation,” Journal of cognitive neuroscience, vol. 19, no. 11, pp. 1854–1871, 2007.
- C.-H. J. Lin, B. J. Knowlton, M.-C. Chiang, M. Iacoboni, P. Udompholkul, and A. D. Wu, “Brain–behavior correlates of optimizing learning through interleaved practice,” Neuroimage, vol. 56, no. 3, pp. 1758–1772, 2011.
- J. M. Schorn and B. J. Knowlton, “Interleaved practice benefits implicit sequence learning and transfer,” Memory & cognition, pp. 1–17, 2021.
- S. Sharma, A. Jha, P. Hegde, and B. Ravindran, “Learning to multi-task by active sampling,” arXiv preprint arXiv:1702.06053, 2017.
- S. Jean, O. Firat, and M. Johnson, “Adaptive scheduling for multi-task learning,” arXiv preprint arXiv:1909.06434, 2019.
- T. Lesort, V. Lomonaco, A. Stoian, D. Maltoni, D. Filliat, and N. Díaz-Rodríguez, “Continual learning for robotics: Definition, framework, learning strategies, opportunities and challenges,” vol. 58, pp. 52–68.
- P.-Y. Oudeyer and F. Kaplan, “What is intrinsic motivation? a typology of computational approaches,” Frontiers in Neurorobotics, vol. 1, 2007.
- K. Kasmarik, G. Baldassarre, V. G. Santucci, and J. Triesch, “Guest editorial special issue on intrinsically motivated open-ended learning (imol),” IEEE Transactions on Cognitive and Developmental Systems, vol. 15, no. 2, pp. 321–324, 2023.
- V. G. Santucci, P.-Y. Oudeyer, A. Barto, and G. Baldassarre, “Intrinsically motivated open-ended learning in autonomous robots,” p. 115, 2020.
- M. I. Sener, Y. Nagai, E. Oztop, and E. Ugur, “Exploration with intrinsic motivation using object-action-outcome latent space,” IEEE Transactions on Cognitive and Developmental Systems, vol. 15, no. 2, 2023.
- S. Bugur, E. Öztop, Y. Nagai, and E. Ugur, “Effect regulated projection of robot’s action space for production and prediction of manipulation primitives through learning progress and predictability-based exploration,” IEEE Trans. Cogn. Dev. Syst., vol. 13, no. 2, pp. 286–297, 2021.
- Y. Burda, H. Edwards, A. Storkey, and O. Klimov, “Exploration by random network distillation,” arXiv preprint arXiv:1810.12894, 2018.
- B. Eysenbach, A. Gupta, J. Ibarz, and S. Levine, “Diversity is all you need: Learning skills without a reward function,” arXiv preprint arXiv:1802.06070, 2018.
- Y. Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum learning,” in Proceedings of the 26th Annual International Conference on Machine Learning. Montreal Quebec Canada: ACM, June 2009, pp. 41–48.
- S. Narvekar, B. Peng, M. Leonetti, J. Sinapov, M. E. Taylor, and P. Stone, “Curriculum learning for reinforcement learning domains: A framework and survey,” vol. abs/2003.04960.
- D. Weinshall, G. Cohen, and D. Amir, “Curriculum learning by transfer learning: Theory and experiments with deep networks,” in International Conference on Machine Learning. PMLR, 2018, pp. 5238–5246.
- E. Ugur and J. H. Piater, “Emergent structuring of interdependent affordance learning tasks,” in 4th International Conference on Development and Learning and on Epigenetic Robotics, ICDL-EPIROB 2014, Genoa, Italy, October 13-16, 2014. IEEE, 2014, pp. 489–494.
- W. Zaremba and I. Sutskever, “Learning to execute,” arXiv preprint arXiv:1410.4615, 2014.
- T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience replay.”
- C. Fernando, D. Banarse, C. Blundell, Y. Zwols, D. Ha, A. A. Rusu, A. Pritzel, and D. Wierstra, “PathNet: Evolution Channels Gradient Descent in Super Neural Networks,” Jan. 2017, arXiv:1701.08734 [cs].
- J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms.”
- S. Huang, R. F. J. Dossa, C. Ye, J. Braga, D. Chakraborty, K. Mehta, and J. G. Araújo, “Cleanrl: High-quality single-file implementations of deep reinforcement learning algorithms,” Journal of Machine Learning Research, vol. 23, no. 274, pp. 1–18, 2022. [Online]. Available: http://jmlr.org/papers/v23/21-1342.html
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization.” in ICLR (Poster), Y. Bengio and Y. LeCun, Eds., 2015.
- A. Raffin, “Rl baselines3 zoo,” https://github.com/DLR-RM/rl-baselines3-zoo, 2020.
- L. Engstrom, A. Ilyas, S. Santurkar, D. Tsipras, F. Janoos, L. Rudolph, and A. Madry, “Implementation matters in deep rl: A case study on ppo and trpo,” in International Conference on Learning Representations, 2020.
- S. E. Ada, E. Ugur, and H. L. Akin, “Generalization in transfer learning: robust control of robot locomotion,” Robotica, vol. 40, no. 11, pp. 3811–3836, 2022.
- M. Towers, J. K. Terry, A. Kwiatkowski, J. U. Balis, G. d. Cola, T. Deleu, M. Goulão, A. Kallinteris, A. KG, M. Krimmel, R. Perez-Vicente, A. Pierré, S. Schulhoff, J. J. Tai, A. T. J. Shen, and O. G. Younis, “Gymnasium,” Mar. 2023. [Online]. Available: https://zenodo.org/record/8127025
- E. Todorov, T. Erez, and Y. Tassa, “Mujoco: A physics engine for model-based control,” in Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on. IEEE, pp. 5026–5033.
- J. P. O’Doherty, P. Dayan, K. Friston, H. Critchley, and R. J. Dolan, “Temporal difference models and reward-related learning in the human brain,” Neuron, vol. 38, no. 2, pp. 329–337, Apr. 2003.
- W. Schultz, P. Dayan, and P. R. Montague, “A neural substrate of prediction and reward,” Science (New York, N.Y.), vol. 275, no. 5306, pp. 1593–1599, Mar. 1997.
- R. Ajemian, A. D’Ausilio, H. Moorman, and E. Bizzi, “A theory for how sensorimotor skills are learned and retained in noisy and nonstationary neural circuits,” Proceedings of the National Academy of Sciences, vol. 110, no. 52, p. E5078–E5087, 2013.
- S. Dohare, A. R. Mahmood, and R. S. Sutton, “Continual backprop: Stochastic gradient descent with persistent randomness,” arXiv preprint, 2021.
- D. M. Wolpert, R. C. Miall, and M. Kawato, “Internal models in the cerebellum,” Trends in Cognitive Sciences, vol. 2, no. 9, p. 338–347, 1998.
- H. Imamizu, “Separated modules for visuomotor control and learning in the cerebellum: a functional mri study,” in NeuroImage: Third International Conference on Functional Mapping of the Human Brain, vol. 5. Academic Press, 1997, p. S598.
- H. Imamizu, S. Miyauchi, T. Tamada, Y. Sasaki, R. Takino, B. PuEtz, T. Yoshioka, and M. Kawato, “Human cerebellar activity reflecting an acquired internal model of a new tool,” Nature, vol. 403, no. 6766, p. 192–195, 2000.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.