Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 86 tok/s
Gemini 2.5 Pro 53 tok/s Pro
GPT-5 Medium 19 tok/s Pro
GPT-5 High 25 tok/s Pro
GPT-4o 84 tok/s Pro
Kimi K2 129 tok/s Pro
GPT OSS 120B 430 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

Is Exploration All You Need? Effective Exploration Characteristics for Transfer in Reinforcement Learning (2404.02235v1)

Published 2 Apr 2024 in cs.LG and cs.AI

Abstract: In deep reinforcement learning (RL) research, there has been a concerted effort to design more efficient and productive exploration methods while solving sparse-reward problems. These exploration methods often share common principles (e.g., improving diversity) and implementation details (e.g., intrinsic reward). Prior work found that non-stationary Markov decision processes (MDPs) require exploration to efficiently adapt to changes in the environment with online transfer learning. However, the relationship between specific exploration characteristics and effective transfer learning in deep RL has not been characterized. In this work, we seek to understand the relationships between salient exploration characteristics and improved performance and efficiency in transfer learning. We test eleven popular exploration algorithms on a variety of transfer types -- or ``novelties'' -- to identify the characteristics that positively affect online transfer learning. Our analysis shows that some characteristics correlate with improved performance and efficiency across a wide range of transfer tasks, while others only improve transfer performance with respect to specific environment changes. From our analysis, make recommendations about which exploration algorithm characteristics are best suited to specific transfer situations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Deep reinforcement learning at the edge of the statistical precipice. In Advances in Neural Information Processing Systems, volume 34, pp.  29304–29320. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper/2021/hash/f514cec81cb148559cf475e7426eed5e-Abstract.html.
  2. Hindsight experience replay. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
  3. Never give up: Learning directed exploration strategies. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=Sye57xStvB.
  4. Novgrid: A flexible grid world for evaluating agent response to novelty. In In Proceedings of AAAI Symposium, Designing Artificial Intelligence for Open Worlds, 2022.
  5. Successor features for transfer in reinforcement learning. Advances in neural information processing systems, 30, 2017.
  6. Towards a unifying framework for formal theories of novelty. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp.  15047–15052, 2021.
  7. Large-scale study of curiosity-driven learning. In International Conference on Learning Representations, 2018a.
  8. Exploration by random network distillation. In International Conference on Learning Representations, 2018b.
  9. Open problems and fundamental limitations of reinforcement learning from human feedback. Transactions on Machine Learning Research, 2023.
  10. Intrinsically motivated reinforcement learning. Advances in neural information processing systems, 17, 2004.
  11. Minigrid & miniworld: Modular & customizable reinforcement learning environments for goal-oriented tasks. CoRR, abs/2306.13831, 2023.
  12. Felipe Leno Da Silva and Anna Helena Reali Costa. A survey on transfer learning for multiagent reinforcement learning systems. Journal of Artificial Intelligence Research, 64:645–703, 2019.
  13. Challenges of real-world reinforcement learning: definitions, benchmarks and analysis. Machine Learning, 110(9):2419–2468, Sep 2021. ISSN 1573-0565. doi: 10.1007/s10994-021-05961-4. URL https://doi.org/10.1007/s10994-021-05961-4.
  14. Diversity is all you need: Learning skills without a reward function. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=SJx63jRqFm.
  15. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pp.  1861–1870. PMLR, 2018.
  16. How to start training: The effect of initialization and architecture. Advances in neural information processing systems, 31, 2018.
  17. Provably efficient maximum entropy exploration. In International Conference on Machine Learning, pp.  2681–2691. PMLR, 2019.
  18. How to train your robot with deep reinforcement learning: lessons we have learned. The International Journal of Robotics Research, 40(4-5):698–721, 2021. doi: 10.1177/0278364920987859. URL https://doi.org/10.1177/0278364920987859.
  19. Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems, 23(6):4909–4926, 2021.
  20. Transfer in reinforcement learning via shared features. 2012.
  21. Exploration in deep reinforcement learning: A survey. Information Fusion, 85:1–22, 2022.
  22. Pat Langley. Open-world learning for radically autonomous agents. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp.  13539–13543, 2020.
  23. Pat Langley. Agents of exploration and discovery. Ai Magazine, 42(4):72–82, 2022.
  24. Alessandro Lazaric. Transfer in Reinforcement Learning: A Framework and a Survey, pp.  143–173. Springer Berlin Heidelberg, 2012.
  25. Continuous control with deep reinforcement learning, 2019.
  26. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation, volume 24, pp.  109–165. Elsevier, 1989.
  27. Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015.
  28. A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10):1345–1359, 2009.
  29. Curiosity-driven exploration by self-supervised prediction. In Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, pp.  2778–2787. JMLR.org, 2017.
  30. Parameter space noise for exploration. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=ByBAl2eAZ.
  31. Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 22(268):1–8, 2021.
  32. Ride: Rewarding impact-driven exploration for procedurally-generated environments. In International Conference on Learning Representations, 2019.
  33. Integrating animal temperament within ecology and evolution. Biological Reviews, 82(2):291–318, 2007. doi: https://doi.org/10.1111/j.1469-185X.2007.00010.x. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1469-185X.2007.00010.x.
  34. Jürgen Schmidhuber. Curious model-building control systems. In Proc. international joint conference on neural networks, pp.  1458–1463, 1991a.
  35. Jürgen Schmidhuber. A possibility for implementing curiosity and boredom in model-building neural controllers, 1991b.
  36. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  37. Lifelong machine learning systems: Beyond learning algorithms. In 2013 AAAI spring symposium series, 2013.
  38. Reinforcement learning: An introduction. MIT press, 2018.
  39. Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 10(7), 2009.
  40. Exploration in deep reinforcement learning: a comprehensive survey. arXiv preprint arXiv:2109.06668, 2021.
  41. Intrinsic reward driven imitation learning via generative model. In Hal Daumé III and Aarti Singh (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp.  10925–10935. PMLR, 13–18 Jul 2020. URL https://proceedings.mlr.press/v119/yu20d.html.
  42. Rewarding episodic visitation discrepancy for exploration in reinforcement learning. In Deep Reinforcement Learning Workshop NeurIPS 2022, 2022a.
  43. Rényi state entropy maximization for exploration acceleration in reinforcement learning. IEEE Transactions on Artificial Intelligence, 2022b.
  44. Online transfer learning in reinforcement learning domains. In 2015 AAAI Fall Symposium Series, 2015.
  45. Deep reinforcement learning for power system applications: An overview. CSEE Journal of Power and Energy Systems, 6(1):213–225, 2019.
  46. Sim-to-real transfer in deep reinforcement learning for robotics: a survey. In 2020 IEEE symposium series on computational intelligence (SSCI), pp.  737–744. IEEE, 2020.
  47. Transfer learning in deep reinforcement learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 2 posts and received 3 likes.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube