Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 147 tok/s
Gemini 2.5 Pro 52 tok/s Pro
GPT-5 Medium 27 tok/s Pro
GPT-5 High 30 tok/s Pro
GPT-4o 96 tok/s Pro
Kimi K2 188 tok/s Pro
GPT OSS 120B 398 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Learning Transparent Reward Models via Unsupervised Feature Selection (2410.18608v2)

Published 24 Oct 2024 in cs.RO

Abstract: In complex real-world tasks such as robotic manipulation and autonomous driving, collecting expert demonstrations is often more straightforward than specifying precise learning objectives and task descriptions. Learning from expert data can be achieved through behavioral cloning or by learning a reward function, i.e., inverse reinforcement learning. The latter allows for training with additional data outside the training distribution, guided by the inferred reward function. We propose a novel approach to construct compact and transparent reward models from automatically selected state features. These inferred rewards have an explicit form and enable the learning of policies that closely match expert behavior by training standard reinforcement learning algorithms from scratch. We validate our method's performance in various robotic environments with continuous and high-dimensional state spaces. Webpage: \url{https://sites.google.com/view/transparent-reward}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Cleaning tasks knowledge transfer between heterogeneous robots: a deep learning approach. Journal of Intelligent & Robotic Systems, 98:191–205, 2020.
  2. Deep imitation learning of sequential fabric smoothing from an algorithmic supervisor. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 9651–9658. IEEE, 2020.
  3. Deep imitation learning for complex manipulation tasks from virtual reality teleoperation. In 2018 IEEE international conference on robotics and automation (ICRA), pages 5628–5635. IEEE, 2018.
  4. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv preprint arXiv:1709.10087, 2017.
  5. Learning deep visuomotor policies for dexterous hand manipulation. In 2019 international conference on robotics and automation (ICRA), pages 3636–3643. IEEE, 2019.
  6. Agile autonomous driving using end-to-end deep imitation learning. arXiv preprint arXiv:1709.07174, 2017.
  7. Urban driving with conditional imitation learning. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 251–257. IEEE, 2020.
  8. f-irl: Inverse reinforcement learning via state marginal matching. In Conference on Robot Learning, pages 529–551. PMLR, 2021.
  9. M. Bain and C. Sammut. A framework for behavioural cloning. In Machine Intelligence 15, pages 103–129, 1995.
  10. Algorithms for inverse reinforcement learning. In Icml, volume 1, page 2, 2000.
  11. Error bounds of imitating policies and environments. Advances in Neural Information Processing Systems, 33:15737–15749, 2020.
  12. Reinforcement learning: An introduction. MIT press, 2018.
  13. Reinforcement learning of full-body humanoid motor skills. In 2010 10th IEEE-RAS International Conference on Humanoid Robots, pages 405–410. IEEE, 2010.
  14. Reinforcement learning in robotics: Applications and real-world challenges. Robotics, 2(3):122–148, 2013.
  15. Deep reinforcement learning framework for autonomous driving. Electronic Imaging, 2017(19):70–76, 2017.
  16. Deep reinforcement learning for robotic manipulation with asynchronous policy updates. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pages 3389–3396. IEEE, 2017.
  17. Continuous control with deep reinforcement learning. In arXiv preprint arXiv:1509.02971, 2015.
  18. P. Abbeel and A. Y. Ng. Apprenticeship learning via inverse reinforcement learning. In Twenty-First International Conference on Machine Learning - ICML ’04, page 1, Banff, Alberta, Canada, 2004. ACM Press. doi:10.1145/1015330.1015430.
  19. Maximum entropy inverse reinforcement learning. In Aaai, volume 8, pages 1433–1438. Chicago, IL, USA, 2008.
  20. Efficient sampling-based maximum entropy inverse reinforcement learning with application to autonomous driving. IEEE Robotics and Automation Letters, 5(4):5355–5362, 2020.
  21. Learning the car-following behavior of drivers using maximum entropy deep inverse reinforcement learning. Journal of advanced transportation, 2020:1–13, 2020.
  22. N. Aghasadeghi and T. Bretl. Maximum entropy inverse reinforcement learning in continuous state spaces with path integrals. In 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1561–1566, 2011. doi:10.1109/IROS.2011.6094679.
  23. Learning driving styles for autonomous vehicles from demonstration. In 2015 IEEE international conference on robotics and automation (ICRA), pages 2641–2646. IEEE, 2015.
  24. Inverse reinforcement learning in a continuous state space with formal guarantees. Advances in Neural Information Processing Systems, 34:6972–6982, 2021.
  25. Deep inverse reinforcement learning. CoRR, abs/1507.04888, 2015.
  26. Nonlinear inverse reinforcement learning with gaussian processes. Advances in neural information processing systems, 24, 2011.
  27. Learning robust rewards with adversarial inverse reinforcement learning. arXiv:1710.11248, 2016. URL https://arxiv.org/abs/1710.11248.
  28. C. Xia and A. El Kamel. Neural inverse reinforcement learning in autonomous navigation. Robotics and Autonomous Systems, 84:1–14, 2016.
  29. S. J. Lee and Z. Popović. Learning behavior styles with inverse reinforcement learning. ACM transactions on graphics (TOG), 29(4):1–7, 2010.
  30. Socially compliant mobile robot navigation via inverse reinforcement learning. The International Journal of Robotics Research, 35(11):1289–1307, 2016.
  31. Feature construction for inverse reinforcement learning. Advances in neural information processing systems, 23, 2010.
  32. J. Choi and K.-E. Kim. Bayesian nonparametric feature construction for inverse reinforcement learning. In Twenty-Third International Joint Conference on Artificial Intelligence. Citeseer, 2013.
  33. Safe imitation learning via fast bayesian reward inference from preferences. In International Conference on Machine Learning, pages 1165–1177. PMLR, 2020.
  34. Guided cost learning: Deep inverse optimal control via policy optimization. In International conference on machine learning, pages 49–58. PMLR, 2016.
  35. J. Ho and S. Ermon. Generative adversarial imitation learning. Advances in neural information processing systems, 29, 2016.
  36. Learning robust rewards with adversarial inverse reinforcement learning. arXiv preprint arXiv:1710.11248, 2017.
  37. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015. URL https://arxiv.org/abs/1511.06434.
  38. Multi-agent adversarial inverse reinforcement learning. In International Conference on Machine Learning, pages 7194–7201. PMLR, 2019.
  39. Adversarial imitation via variational inverse reinforcement learning. arXiv preprint arXiv:1809.06404, 2018.
  40. Robust adversarial reinforcement learning. In International Conference on Machine Learning, pages 2817–2826. PMLR, 2017.
  41. K. Pearson. Contributions to the mathematical theory of evolution. Philosophical Transactions of the Royal Society of London A, 185:71–110, 1894. doi:10.1098/rsta.1894.0003.
  42. Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905, 2018.
  43. Gymnasium, Mar. 2023. URL https://zenodo.org/record/8127025.
  44. Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 22(268):1–8, 2021. URL http://jmlr.org/papers/v22/20-1364.html.
  45. C. Villani. Optimal Transport: Old and New, volume 338 of Grundlehren der mathematischen Wissenschaften. Springer, Berlin, Heidelberg, 2008. ISBN 978-3-540-71050-9. doi:10.1007/978-3-540-71050-9.
  46. imitation: Clean imitation learning implementations. arXiv:2211.11972v1 [cs.LG], 2022. URL https://arxiv.org/abs/2211.11972.
  47. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2019.
  48. D. Kingma. Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.