Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Unveiling the Significance of Toddler-Inspired Reward Transition in Goal-Oriented Reinforcement Learning (2403.06880v2)

Published 11 Mar 2024 in cs.LG and cs.AI

Abstract: Toddlers evolve from free exploration with sparse feedback to exploiting prior experiences for goal-directed learning with denser rewards. Drawing inspiration from this Toddler-Inspired Reward Transition, we set out to explore the implications of varying reward transitions when incorporated into Reinforcement Learning (RL) tasks. Central to our inquiry is the transition from sparse to potential-based dense rewards, which share optimal strategies regardless of reward changes. Through various experiments, including those in egocentric navigation and robotic arm manipulation tasks, we found that proper reward transitions significantly influence sample efficiency and success rates. Of particular note is the efficacy of the toddler-inspired Sparse-to-Dense (S2D) transition. Beyond these performance metrics, using Cross-Density Visualizer technique, we observed that transitions, especially the S2D, smooth the policy loss landscape, promoting wide minima that enhance generalization in RL models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. Critical learning periods in deep networks. In International Conference on Learning Representations.
  2. Learning dexterous in-hand manipulation. The International Journal of Robotics Research, 39(1): 3–20.
  3. A survey on intrinsic motivation in reinforcement learning. arXiv preprint arXiv:1908.06976.
  4. Never give up: Learning directed exploration strategies. arXiv preprint arXiv:2002.06038.
  5. Toddler-inspired visual object learning. Advances in neural information processing systems, 31.
  6. Curriculum learning. In ICML ’09.
  7. Openai gym. arXiv preprint arXiv:1606.01540.
  8. Exploration by random network distillation. arXiv preprint arXiv:1810.12894.
  9. A Critical Period for Robust Curriculum-Based Deep Reinforcement Learning of Sequential Action in a Robot Arm. Topics in Cognitive Science, 2(2): 311–326.
  10. Class rectification hard mining for imbalanced deep learning. In Proceedings of the IEEE International Conference on Computer Vision, 1851–1860.
  11. Self-Contrastive Learning with Hard Negative Sampling for Self-Supervised Point Cloud Learning. In Proceedings of the 29th ACM International Conference on Multimedia, MM ’21, 3133–3142. New York, NY, USA: Association for Computing Machinery. ISBN 9781450386517.
  12. Automatic goal generation for reinforcement learning agents. In International conference on machine learning, 1515–1528. PMLR.
  13. Sharpness-aware Minimization for Efficiently Improving Generalization. In International Conference on Learning Representations.
  14. Gibson, E. J. 1988. Exploratory behavior in the development of perceiving, acting, and the acquiring of knowledge. Annual review of psychology, 39(1): 1–42.
  15. Qualitatively characterizing neural network optimization problems. arXiv preprint arXiv:1412.6544.
  16. The scientist in the crib: Minds, brains, and how children learn. William Morrow & Co.
  17. Changes in cognitive flexibility and hypothesis search across human life history from childhood to adolescence to adulthood. Proceedings of the National Academy of Sciences, 114(30): 7892–7899.
  18. Automated curriculum learning for neural networks. In international conference on machine learning, 1311–1320. PMLR.
  19. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, 1861–1870. PMLR.
  20. On The Power of Curriculum Learning in Training Deep Networks. ArXiv, 2.
  21. Expressing arbitrary reward functions as potential-based advice. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 29.
  22. Finding Flatter Minima with SGD.
  23. Hard Negative Mixing for Contrastive Learning. In Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M.; and Lin, H., eds., Advances in Neural Information Processing Systems, volume 33, 21798–21809. Curran Associates, Inc.
  24. Vizdoom: A doom-based ai research platform for visual reinforcement learning. In 2016 IEEE conference on computational intelligence and games (CIG), 1–8. IEEE.
  25. On large-batch training for deep learning: Generalization gap and sharp minima. arXiv preprint arXiv:1609.04836.
  26. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima. In International Conference on Learning Representations.
  27. L-SA: Learning Under-Explored Targets in Multi-Target Reinforcement Learning. arXiv preprint arXiv:2305.13741.
  28. Goal-aware cross-entropy for multi-target reinforcement learning. Advances in Neural Information Processing Systems, 34: 2783–2795.
  29. Visual Hindsight Self-Imitation Learning for Interactive Navigation. arXiv preprint arXiv:2312.03446.
  30. Reward (mis) design for autonomous driving. Artificial Intelligence, 316: 103829.
  31. Exploration in deep reinforcement learning: A survey. Information Fusion, 85: 1–22.
  32. Laud, A. D. 2004. Theory and application of reward shaping in reinforcement learning. University of Illinois at Urbana-Champaign.
  33. Visualizing the loss landscape of neural nets. Advances in neural information processing systems, 31.
  34. MacKay, D. J. 1992. Information-based objective functions for active data selection. Neural computation, 4(4): 590–604.
  35. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, 1928–1937. PMLR.
  36. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
  37. Human-level control through deep reinforcement learning. Nature, 518(7540): 529–533.
  38. Generalizing curricula for reinforcement learning. In 4th Lifelong Machine Learning Workshop at ICML 2020.
  39. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping. In International Conference on Machine Learning.
  40. Policy invariance under reward transformations: Theory and application to reward shaping. In Icml, volume 99, 278–287. Citeseer.
  41. How evolution may work through curiosity-driven developmental process. Topics in Cognitive Science, 8(2): 492–502.
  42. Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1).
  43. Toddler-Guidance Learning: Impacts of Critical Period on Multimodal AI Agents. In Proceedings of the 2021 International Conference on Multimodal Interaction, 212–220.
  44. Curiosity-driven exploration by self-supervised prediction. In International conference on machine learning, 2778–2787. PMLR.
  45. The origins of intelligence in children, volume 8. International Universities Press New York.
  46. A Low Latency Adaptive Coding Spiking Framework for Deep Reinforcement Learning. arXiv preprint arXiv:2211.11760.
  47. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
  48. From Neurons to Neighborhoods: The Science of Early Childhood Development. eric. ed. gov. National Academy of Sciences Press: Washington DC. Accessed on May, 8: 2015.
  49. Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 10(7).
  50. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, 5026–5033. IEEE.
  51. Safe reinforcement learning via curriculum induction. Advances in Neural Information Processing Systems, 33: 12151–12162.
  52. Curriculum learning by transfer learning: Theory and experiments with deep networks. In International Conference on Machine Learning, 5238–5246. PMLR.
  53. Zhang, B.-T. 1994. Selecting a Critical Subset of Given Examples during Learning. In International Conference on Artificial Neural Networks, 517–520. Springer.
  54. Tuning Synaptic Connections instead of Weights by Genetic Algorithm in Spiking Policy Network. arXiv preprint arXiv:2301.10292.
  55. Multi-sacle dynamic coding improved spiking actor network for reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, 59–67.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Junseok Park (10 papers)
  2. Yoonsung Kim (7 papers)
  3. Hee Bin Yoo (2 papers)
  4. Min Whoo Lee (7 papers)
  5. Kibeom Kim (6 papers)
  6. Won-Seok Choi (7 papers)
  7. Minsu Lee (13 papers)
  8. Byoung-Tak Zhang (83 papers)

Summary

We haven't generated a summary for this paper yet.