Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Flexible Attention-Based Multi-Policy Fusion for Efficient Deep Reinforcement Learning (2210.03729v2)

Published 7 Oct 2022 in cs.LG, cs.AI, and cs.RO

Abstract: Reinforcement learning (RL) agents have long sought to approach the efficiency of human learning. Humans are great observers who can learn by aggregating external knowledge from various sources, including observations from others' policies of attempting a task. Prior studies in RL have incorporated external knowledge policies to help agents improve sample efficiency. However, it remains non-trivial to perform arbitrary combinations and replacements of those policies, an essential feature for generalization and transferability. In this work, we present Knowledge-Grounded RL (KGRL), an RL paradigm fusing multiple knowledge policies and aiming for human-like efficiency and flexibility. We propose a new actor architecture for KGRL, Knowledge-Inclusive Attention Network (KIAN), which allows free knowledge rearrangement due to embedding-based attentive action prediction. KIAN also addresses entropy imbalance, a problem arising in maximum entropy KGRL that hinders an agent from efficiently exploring the environment, through a new design of policy distributions. The experimental results demonstrate that KIAN outperforms alternative methods incorporating external knowledge policies and achieves efficient and flexible learning. Our implementation is available at https://github.com/Pascalson/KGRL.git

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Reincarnating reinforcement learning: Reusing prior computation to accelerate progress. Advances in Neural Information Processing Systems, 35:28955–28971, 2022.
  2. The option-critic architecture. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, 2017.
  3. A framework for behavioural cloning. In Machine Intelligence 15, 1995.
  4. A. Bandura. Social Learning Theory. Prentice-Hall series in social learning theory. Prentice Hall, 1977. ISBN 9780138167516.
  5. Minimalistic gridworld environment for openai gym. https://github.com/maximecb/gym-minigrid, 2018.
  6. Feudal reinforcement learning. Advances in neural information processing systems, 5, 1992.
  7. Magnetic control of tokamak plasmas through deep reinforcement learning. Nature, 602(7897):414–419, 2022.
  8. Integrating behavior cloning and reinforcement learning for improved performance in dense and sparse reward environments. arXiv preprint arXiv:1910.04281, 2019.
  9. Reinforcement learning with deep energy-based policies. In International conference on machine learning, pages 1352–1361. PMLR, 2017.
  10. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In ICML, 2018.
  11. Learning from demonstrations for real world reinforcement learning. 2017.
  12. Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144, 2016.
  13. Language as an abstraction for hierarchical deep reinforcement learning. Advances in Neural Information Processing Systems, 32, 2019.
  14. Leslie Pack Kaelbling. The foundation of efficient robot learning. Science, 369(6506):915–916, 2020.
  15. Scalable deep reinforcement learning for vision-based robotic manipulation. In Conference on Robot Learning, pages 651–673. PMLR, 2018.
  16. Learning options with interest functions. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 9955–9956, 2019.
  17. Unsupervised skill discovery with bottleneck option learning. arXiv preprint arXiv:2106.14305, 2021.
  18. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. Advances in neural information processing systems, 29, 2016.
  19. David JC MacKay. Information theory, inference and learning algorithms. Cambridge university press, 2003.
  20. Near-optimal representation learning for hierarchical reinforcement learning. arXiv preprint arXiv:1810.01257, 2018.
  21. Overcoming exploration in reinforcement learning with demonstrations. In 2018 IEEE international conference on robotics and automation (ICRA), pages 6292–6299. IEEE, 2018.
  22. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  23. Reinforced imitation: Sample efficient deep reinforcement learning for mapless navigation by leveraging prior demonstrations. IEEE Robotics and Automation Letters, 3(4):4423–4430, 2018.
  24. Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464, 2018.
  25. Composing task-agnostic policies with deep reinforcement learning. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=H1ezFREtwH.
  26. Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 22(268):1–8, 2021. URL http://jmlr.org/papers/v22/20-1364.html.
  27. Attend, adapt and transfer: Attentive deep architecture for adaptive transfer from multiple sources in the same domain. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=Sy6iJDqlx.
  28. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv preprint arXiv:1709.10087, 2017.
  29. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  30. Deep reinforcement learning for modeling human locomotion control in neuromechanical simulation. Journal of neuroengineering and rehabilitation, 18(1):1–17, 2021.
  31. Learning options in reinforcement learning. In International Symposium on abstraction, reformulation, and approximation, pages 212–223. Springer, 2002.
  32. Reinforcement learning: An introduction. MIT press, 2018.
  33. Toward robust long range policy transfer. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 9958–9966, 2021.
  34. Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. arXiv preprint arXiv:1707.08817, 2017.
  35. Outracing champion gran turismo drivers with deep reinforcement learning. Nature, 602(7896):223–228, 2022.
  36. Kogun: Accelerating deep reinforcement learning via integrating human suboptimal knowledge. In International Joint Conference on Artificial Intelligence, 2020.
  37. Brian D Ziebart. Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University, 2010.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com