Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning from Ambiguous Demonstrations with Self-Explanation Guided Reinforcement Learning (2110.05286v4)

Published 11 Oct 2021 in cs.LG

Abstract: Our work aims at efficiently leveraging ambiguous demonstrations for the training of a reinforcement learning (RL) agent. An ambiguous demonstration can usually be interpreted in multiple ways, which severely hinders the RL-Agent from learning stably and efficiently. Since an optimal demonstration may also suffer from being ambiguous, previous works that combine RL and learning from demonstration (RLfD works) may not work well. Inspired by how humans handle such situations, we propose to use self-explanation (an agent generates explanations for itself) to recognize valuable high-level relational features as an interpretation of why a successful trajectory is successful. This way, the agent can provide some guidance for its RL learning. Our main contribution is to propose the Self-Explanation for RL from Demonstrations (SERLfD) framework, which can overcome the limitations of traditional RLfD works. Our experimental results show that an RLfD model can be improved by using our SERLfD framework in terms of training stability and performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. On ambiguity in robot learning from demonstration. In Intelligent autonomous systems, 47–56. Citeseer.
  2. Using perspective taking to learn from ambiguous demonstrations. Robotics and autonomous systems, 54(5): 385–393.
  3. Bayesian Robust Optimization for Imitation Learning. arXiv preprint arXiv:2007.12315.
  4. Pybullet, a python module for physics simulation for games, robotics and machine learning.
  5. Pre-training neural networks with human demonstrations for deep reinforcement learning. arXiv preprint arXiv:1709.04083.
  6. Causal confusion in imitation learning. In Advances in Neural Information Processing Systems, 11698–11709.
  7. A connection between generative adversarial networks, inverse reinforcement learning, and energy-based models. arXiv preprint arXiv:1611.03852.
  8. Guided cost learning: Deep inverse optimal control via policy optimization. In International conference on machine learning, 49–58.
  9. Learning Robust Rewards with Adverserial Inverse Reinforcement Learning. In International Conference on Learning Representations.
  10. Addressing function approximation error in actor-critic methods. arXiv preprint arXiv:1802.09477.
  11. Reinforcement learning from imperfect demonstrations. arXiv preprint arXiv:1802.05313.
  12. Reinforcement learning with deep energy-based policies. In ICML’17 Proceedings of the 34th International Conference on Machine Learning - Volume 70, 1352–1361.
  13. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv:1801.01290.
  14. Deep q-learning from demonstrations. arXiv preprint arXiv:1704.03732.
  15. Symbols as a lingua franca for bridging human-ai chasm for explainable and advisable ai systems. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, 12262–12267.
  16. King, J. E. 2008. Binary logistic regression. Best practices in quantitative methods, 358–384.
  17. Robot learning from demonstration by constructing skill trees. The International Journal of Robotics Research, 31(3): 360–375.
  18. Ambiguity analysis in learning from demonstration applications for mobile robots. In 2013 16th International Conference on Advanced Robotics (ICAR), 1–6. IEEE.
  19. Overcoming exploration in reinforcement learning with demonstrations. In 2018 IEEE International Conference on Robotics and Automation (ICRA), 6292–6299. IEEE.
  20. Policy invariance under reward transformations: Theory and application to reward shaping. In ICML, volume 99, 278–287.
  21. Learning and generalization of complex tasks from unstructured demonstrations. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, 5239–5246. IEEE.
  22. Reinforced imitation: Sample efficient deep reinforcement learning for mapless navigation by leveraging prior demonstrations. IEEE Robotics and Automation Letters, 3(4): 4423–4430.
  23. Deep reinforcement learning for vision-based robotic grasping: A simulated comparative evaluation of off-policy methods. In 2018 IEEE International Conference on Robotics and Automation (ICRA), 6284–6291. IEEE.
  24. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv preprint arXiv:1709.10087.
  25. Learning Montezuma’s Revenge from a Single Demonstration. arXiv preprint arXiv:1812.03381.
  26. Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. arXiv preprint arXiv:1707.08817.
  27. Fetch and freight: Standard platforms for service robot applications. In Workshop on autonomous mobile service robots.
  28. Maximum entropy inverse reinforcement learning. In Aaai, volume 8, 1433–1438. Chicago, IL, USA.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub