Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Image-Based Deep Reinforcement Learning with Intrinsically Motivated Stimuli: On the Execution of Complex Robotic Tasks (2407.21338v1)

Published 31 Jul 2024 in cs.AI and cs.LG

Abstract: Reinforcement Learning (RL) has been widely used to solve tasks where the environment consistently provides a dense reward value. However, in real-world scenarios, rewards can often be poorly defined or sparse. Auxiliary signals are indispensable for discovering efficient exploration strategies and aiding the learning process. In this work, inspired by intrinsic motivation theory, we postulate that the intrinsic stimuli of novelty and surprise can assist in improving exploration in complex, sparsely rewarded environments. We introduce a novel sample-efficient method able to learn directly from pixels, an image-based extension of TD3 with an autoencoder called \textit{NaSA-TD3}. The experiments demonstrate that NaSA-TD3 is easy to train and an efficient method for tackling complex continuous-control robotic tasks, both in simulated environments and real-world settings. NaSA-TD3 outperforms existing state-of-the-art RL image-based methods in terms of final performance without requiring pre-trained models or human demonstrations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. A. G. Barto, “Intrinsic motivation and reinforcement learning,” Intrinsically motivated learning in natural and artificial systems, pp. 17–47, 2013.
  2. R. M. Ryan and E. L. Deci, “Self-determination theory and the facilitation of intrinsic motivation, social development, and well-being.,” American psychologist, vol. 55, no. 1, p. 68, 2000.
  3. A. Aubret, L. Matignon, and S. Hassas, “An information-theoretic perspective on intrinsic motivation in reinforcement learning: a survey,” Entropy, vol. 25, no. 2, p. 327, 2023.
  4. N. Siddique, P. Dhakan, I. Rano, and K. Merrick, “A review of the relationship between novelty, intrinsic motivation and reinforcement learning,” Paladyn, Journal of Behavioral Robotics, vol. 8, no. 1, pp. 58–69, 2017.
  5. G. Baldassarre, “Intrinsic motivations and open-ended learning,” arXiv preprint arXiv:1912.13263, 2019.
  6. G. Baldassarre and M. Mirolli, “Intrinsically motivated learning systems: an overview,” Intrinsically motivated learning in natural and artificial systems, pp. 1–14, 2013.
  7. A. Barto, M. Mirolli, and G. Baldassarre, “Novelty or surprise?,” Frontiers in psychology, vol. 4, p. 907, 2013.
  8. J. Schomaker, “Unexplored territory: Beneficial effects of novelty on memory,” Neurobiology of learning and memory, vol. 161, pp. 46–50, 2019.
  9. C. R. Jonassaint, S. H. Boyle, R. B. Williams, D. B. Mark, I. C. Siegler, and J. C. Barefoot, “Facets of openness predict mortality in patients with cardiac disease,” Psychosomatic Medicine, vol. 69, no. 4, pp. 319–322, 2007.
  10. J. Schomaker and M. Meeter, “Short-and long-lasting consequences of novelty, deviance and surprise on brain and cognition,” Neuroscience & Biobehavioral Reviews, vol. 55, pp. 268–279, 2015.
  11. G. E. Swan and D. Carmelli, “Curiosity and mortality in aging adults: A 5-year follow-up of the western collaborative group study.,” Psychology and Aging, vol. 11, no. 3, p. 449, 1996.
  12. A. Jaegle, V. Mehrpour, and N. Rust, “Visual novelty, curiosity, and intrinsic reward in machine learning and the brain,” Current opinion in neurobiology, vol. 58, pp. 167–174, 2019.
  13. G. D. Reynolds, “Infant visual attention and object recognition,” Behavioural brain research, vol. 285, pp. 34–43, 2015.
  14. R. M. Krebs, B. H. Schott, H. Schütze, and E. Düzel, “The novelty exploration bonus and its attentional modulation,” Neuropsychologia, vol. 47, no. 11, pp. 2272–2281, 2009.
  15. E. Düzel, N. Bunzeck, M. Guitart-Masip, and S. Düzel, “Novelty-related motivation of anticipation and exploration by dopamine (nomad): implications for healthy aging,” Neuroscience & Biobehavioral Reviews, vol. 34, no. 5, pp. 660–669, 2010.
  16. M. Guitart-Masip, N. Bunzeck, K. E. Stephan, R. J. Dolan, and E. Düzel, “Contextual novelty changes reward representations in the striatum,” Journal of Neuroscience, vol. 30, no. 5, pp. 1721–1726, 2010.
  17. D. Yarats, A. Zhang, I. Kostrikov, B. Amos, J. Pineau, and R. Fergus, “Improving sample efficiency in model-free reinforcement learning from images,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 10674–10681, 2021.
  18. A. X. Lee, A. Nagabandi, P. Abbeel, and S. Levine, “Stochastic latent actor-critic: Deep reinforcement learning with a latent variable model,” Advances in Neural Information Processing Systems, vol. 33, pp. 741–752, 2020.
  19. D. Hafner, T. Lillicrap, I. Fischer, R. Villegas, D. Ha, H. Lee, and J. Davidson, “Learning latent dynamics for planning from pixels,” in International conference on machine learning, pp. 2555–2565, PMLR, 2019.
  20. B. C. Stadie, S. Levine, and P. Abbeel, “Incentivizing exploration in reinforcement learning with deep predictive models,” arXiv preprint arXiv:1507.00814, 2015.
  21. S. Fujimoto, H. Hoof, and D. Meger, “Addressing function approximation error in actor-critic methods,” in International conference on machine learning, pp. 1587–1596, PMLR, 2018.
  22. H. A. Xu, A. Modirshanechi, M. P. Lehmann, W. Gerstner, and M. H. Herzog, “Novelty is not surprise: Human exploratory and adaptive behavior in sequential decision-making,” PLOS Computational Biology, vol. 17, no. 6, p. e1009070, 2021.
  23. P.-Y. Oudeyer and F. Kaplan, “How can we define intrinsic motivation?,” in the 8th International Conference on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems, Lund University Cognitive Studies, Lund: LUCS, Brighton, 2008.
  24. P.-Y. Oudeyer and F. Kaplan, “What is intrinsic motivation? a typology of computational approaches,” Frontiers in neurorobotics, vol. 1, p. 6, 2007.
  25. O. E. Dictionary, “novelty, n., sense 1.a.”
  26. P. Becker-Ehmck, M. Karl, J. Peters, and P. van der Smagt, “Exploration via empowerment gain: Combining novelty, surprise and learning progress,” in ICML 2021 Workshop on Unsupervised Reinforcement Learning, 2021.
  27. J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, “Trust region policy optimization,” in International conference on machine learning, pp. 1889–1897, PMLR, 2015.
  28. J. Z. Kolter and A. Y. Ng, “Near-bayesian exploration in polynomial time,” in Proceedings of the 26th annual international conference on machine learning, pp. 513–520, 2009.
  29. J. Oh, X. Guo, H. Lee, R. L. Lewis, and S. Singh, “Action-conditional video prediction using deep networks in atari games,” Advances in neural information processing systems, vol. 28, 2015.
  30. H. Tang, R. Houthooft, D. Foote, A. Stooke, O. Xi Chen, Y. Duan, J. Schulman, F. DeTurck, and P. Abbeel, “# exploration: A study of count-based exploration for deep reinforcement learning,” Advances in neural information processing systems, vol. 30, 2017.
  31. M. Bellemare, S. Srinivasan, G. Ostrovski, T. Schaul, D. Saxton, and R. Munos, “Unifying count-based exploration and intrinsic motivation,” Advances in neural information processing systems, vol. 29, 2016.
  32. R. I. Brafman and M. Tennenholtz, “R-max-a general polynomial time algorithm for near-optimal reinforcement learning,” Journal of Machine Learning Research, vol. 3, no. Oct, pp. 213–231, 2002.
  33. M. Kearns and S. Singh, “Near-optimal reinforcement learning in polynomial time,” Machine learning, vol. 49, pp. 209–232, 2002.
  34. G. Ostrovski, M. G. Bellemare, A. Oord, and R. Munos, “Count-based exploration with neural density models,” in International conference on machine learning, pp. 2721–2730, PMLR, 2017.
  35. J. Fu, J. Co-Reyes, and S. Levine, “Ex2: Exploration with exemplar models for deep reinforcement learning,” Advances in neural information processing systems, vol. 30, 2017.
  36. M. B. Hafez, C. Weber, M. Kerzel, and S. Wermter, “Deep intrinsically motivated continuous actor-critic for efficient robotic visuomotor skill learning,” Paladyn, Journal of Behavioral Robotics, vol. 10, no. 1, pp. 14–29, 2019.
  37. N. Bougie and R. Ichise, “Fast and slow curiosity for high-level exploration in reinforcement learning,” Applied Intelligence, vol. 51, pp. 1086–1107, 2021.
  38. J. Lehman and K. O. Stanley, “Abandoning objectives: Evolution through the search for novelty alone,” Evolutionary computation, vol. 19, no. 2, pp. 189–223, 2011.
  39. D. Pathak, P. Agrawal, A. A. Efros, and T. Darrell, “Curiosity-driven exploration by self-supervised prediction,” in International conference on machine learning, pp. 2778–2787, PMLR, 2017.
  40. N. Bougie and R. Ichise, “Skill-based curiosity for intrinsically motivated reinforcement learning,” Machine Learning, vol. 109, pp. 493–512, 2020.
  41. E. Aljalbout, M. Ulmer, and R. Triebel, “Seeking visual discomfort: Curiosity-driven representations for reinforcement learning,” in 2022 International Conference on Robotics and Automation (ICRA), pp. 3591–3597, IEEE, 2022.
  42. T. F. Brady, T. Konkle, G. A. Alvarez, and A. Oliva, “Visual long-term memory has a massive storage capacity for object details,” Proceedings of the National Academy of Sciences, vol. 105, no. 38, pp. 14325–14329, 2008.
  43. L. Standing, “Learning 10000 pictures,” The Quarterly journal of experimental psychology, vol. 25, no. 2, pp. 207–222, 1973.
  44. U. Sara, M. Akter, and M. S. Uddin, “Image quality assessment through fsim, ssim, mse and psnr—a comparative study,” Journal of Computer and Communications, vol. 7, no. 3, pp. 8–18, 2019.
  45. D. Valencia, J. Jia, R. Li, A. Hayashi, M. Lecchi, R. Terezakis, T. Gee, M. Liarokapis, B. A. MacDonald, and H. Williams, “Comparison of model-based and model-free reinforcement learning for real-world dexterous robotic manipulation tasks,” in 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 871–878, IEEE, 2023.
  46. M. Janner, J. Fu, M. Zhang, and S. Levine, “When to trust your model: Model-based policy optimization,” Advances in neural information processing systems, vol. 32, 2019.
  47. K. Chua, R. Calandra, R. McAllister, and S. Levine, “Deep reinforcement learning in a handful of trials using probabilistic dynamics models,” Advances in neural information processing systems, vol. 31, 2018.
  48. C. Finn, X. Y. Tan, Y. Duan, T. Darrell, S. Levine, and P. Abbeel, “Learning visual feature spaces for robotic manipulation with deep spatial autoencoders,” arXiv preprint arXiv:1509.06113, vol. 25, p. 2, 2015.
  49. V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al., “Human-level control through deep reinforcement learning,” nature, vol. 518, no. 7540, pp. 529–533, 2015.
  50. Y. Tassa, Y. Doron, A. Muldal, T. Erez, Y. Li, D. d. L. Casas, D. Budden, A. Abdolmaleki, J. Merel, A. Lefrancq, et al., “Deepmind control suite,” arXiv preprint arXiv:1801.00690, 2018.
  51. T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V. Kumar, H. Zhu, A. Gupta, P. Abbeel, et al., “Soft actor-critic algorithms and applications,” arXiv preprint arXiv:1812.05905, 2018.
  52. H. Zhu, A. Gupta, A. Rajeswaran, S. Levine, and V. Kumar, “Dexterous manipulation with deep reinforcement learning: Efficient, general, and low-cost,” in 2019 International Conference on Robotics and Automation (ICRA), pp. 3651–3657, IEEE, 2019.
  53. M. Ahn, H. Zhu, K. Hartikainen, H. Ponte, A. Gupta, S. Levine, and V. Kumar, “Robel: Robotics benchmarks for learning with low-cost robots,” in Conference on robot learning, pp. 1300–1313, PMLR, 2020.
  54. O. M. Andrychowicz, B. Baker, M. Chociej, R. Jozefowicz, B. McGrew, J. Pachocki, A. Petron, M. Plappert, G. Powell, A. Ray, et al., “Learning dexterous in-hand manipulation,” The International Journal of Robotics Research, vol. 39, no. 1, pp. 3–20, 2020.

Summary

We haven't generated a summary for this paper yet.