Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficient Preference-Based Reinforcement Learning Using Learned Dynamics Models (2301.04741v2)

Published 11 Jan 2023 in cs.LG

Abstract: Preference-based reinforcement learning (PbRL) can enable robots to learn to perform tasks based on an individual's preferences without requiring a hand-crafted reward function. However, existing approaches either assume access to a high-fidelity simulator or analytic model or take a model-free approach that requires extensive, possibly unsafe online environment interactions. In this paper, we study the benefits and challenges of using a learned dynamics model when performing PbRL. In particular, we provide evidence that a learned dynamics model offers the following benefits when performing PbRL: (1) preference elicitation and policy optimization require significantly fewer environment interactions than model-free PbRL, (2) diverse preference queries can be synthesized safely and efficiently as a byproduct of standard model-based RL, and (3) reward pre-training based on suboptimal demonstrations can be performed without any environmental interaction. Our paper provides empirical evidence that learned dynamics models enable robots to learn customized policies based on user preferences in ways that are safer and more sample efficient than prior preference learning approaches. Supplementary materials and code are available at https://sites.google.com/berkeley.edu/mop-rl.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (65)
  1. “A survey of robot learning from demonstration” In Robotics and autonomous systems 57.5 Elsevier, 2009, pp. 469–483
  2. “End-to-end differentiable adversarial imitation learning” In International Conference on Machine Learning, 2017, pp. 390–399 PMLR
  3. Nir Baram, Oron Anschel and Shie Mannor “Model-based adversarial imitation learning” In arXiv preprint arXiv:1612.02179, 2016
  4. “Curious iLQR: Resolving uncertainty in model-based RL” In Conference on Robot Learning, 2020, pp. 162–171 PMLR
  5. “Active Preference-Based Gaussian Process Regression for Reward Learning” In Robotics: Science and Systems, 2020
  6. “Asking Easy Questions: A User-Friendly Approach to Active Reward Learning” In Conference on Robot Learning, 2020, pp. 1177–1190 PMLR
  7. “The cross-entropy method for optimization” In Handbook of statistics 31 Elsevier, 2013, pp. 35–59
  8. “OpenAI Gym”, 2016 eprint: arXiv:1606.01540
  9. “Safe imitation learning via fast Bayesian reward inference from preferences” In International Conference on Machine Learning, 2020, pp. 1165–1177 PMLR
  10. “Extrapolating beyond suboptimal demonstrations via inverse reinforcement learning from observations” In International Conference on Machine Learning, 2019, pp. 783–792 PMLR
  11. Daniel S Brown, Wonjoon Goo and Scott Niekum “Better-than-demonstrator imitation learning via automatically-ranked demonstrations” In Conference on robot learning, 2020, pp. 330–359 PMLR
  12. “Exploration by random network distillation” In Seventh International Conference on Learning Representations, 2019, pp. 1–17
  13. Letian Chen, Rohan Paleja and Matthew Gombolay “Learning from suboptimal demonstration via self-supervised reward regression” In Conference on robot learning, 2021, pp. 1262–1277 PMLR
  14. “Deep Reinforcement Learning from Human Preferences” In NIPS, 2017
  15. “Deep reinforcement learning in a handful of trials using probabilistic dynamics models” In Advances in neural information processing systems 31, 2018
  16. “Statistical data cleaning for deep learning of automation tasks from demonstrations” In 2017 13th IEEE Conference on Automation Science and Engineering (CASE), 2017, pp. 1142–1149 IEEE
  17. “Stochastic video generation with a learned prior” In International conference on machine learning, 2018, pp. 1174–1183 PMLR
  18. “Visual foresight: Model-based deep reinforcement learning for vision-based robotic control” In arXiv preprint arXiv:1812.00568, 2018
  19. “Imitating latent policies from observation” In International conference on machine learning, 2019, pp. 1755–1763 PMLR
  20. “Assistive gym: A physics simulation framework for assistive robotics” In 2020 IEEE International Conference on Robotics and Automation (ICRA), 2020, pp. 10169–10176 IEEE
  21. “Deep visual foresight for planning robot motion” In 2017 IEEE International Conference on Robotics and Automation (ICRA), 2017, pp. 2786–2793 IEEE
  22. Chelsea Finn, Sergey Levine and Pieter Abbeel “Guided cost learning: Deep inverse optimal control via policy optimization” In International conference on machine learning, 2016, pp. 49–58 PMLR
  23. “The Effect of Modeling Human Rationality Level on Learning Rewards from Multiple Feedback Types” In Proceedings of the AAAI Conference on Artificial Intelligence, 2023
  24. “Negative Result for Learning from Demonstration: Challenges for End-Users Teaching Robots with Task And Motion Planning Abstractions” In Robotics: Science and Systems (RSS), 2022
  25. “Learning cost functions for motion planning from human preferences” In Proceedings of the IROS 2014 Workshop on Machine Learning in Planning and Control of Robot Motion 1, 2014, pp. 48–6
  26. “Modelling transition dynamics in MDPs with RKHS embeddings” In Proceedings of the 29th International Conference on International Conference on Machine Learning, 2012, pp. 1603–1610
  27. “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor” In International conference on machine learning, 2018, pp. 1861–1870 PMLR
  28. “Learning latent dynamics for planning from pixels” In International conference on machine learning, 2019, pp. 2555–2565 PMLR
  29. “Imitation learning: A survey of learning methods” In ACM Computing Surveys (CSUR) 50.2 ACM New York, NY, USA, 2017, pp. 1–35
  30. “Reward learning from human preferences and demonstrations in Atari” In arXiv preprint arXiv:1811.06521, 2018
  31. “Learning preferences for manipulation tasks from online coactive feedback” In The International Journal of Robotics Research 34.10 SAGE Publications Sage UK: London, England, 2015, pp. 1296–1313
  32. “Model Based Reinforcement Learning for Atari” In International Conference on Learning Representations, 2019
  33. Rahul Kidambi, Jonathan Chang and Wen Sun “MobILE: Model-Based Imitation Learning From Observation Alone” In Advances in Neural Information Processing Systems 34, 2021
  34. J Zico Kolter and Gaurav Manek “Learning stable deep dynamics models” In Advances in neural information processing systems 32, 2019
  35. Kimin Lee, Laura Smith and Pieter Abbeel “PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training” In International Conference on Machine Learning, 2021
  36. “Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control” In International Conference on Learning Representations, 2018
  37. “What Matters in Learning from Offline Human Demonstrations for Robot Manipulation” In Conference on Robot Learning, 2022, pp. 1678–1690 PMLR
  38. “Deep dynamics models for learning dexterous manipulation” In Conference on Robot Learning, 2020, pp. 1101–1112 PMLR
  39. “Model learning for robot control: a survey” In Cognitive processing 12.4 Springer, 2011, pp. 319–340
  40. “Dueling posterior sampling for preference-based reinforcement learning” In Conference on Uncertainty in Artificial Intelligence, 2020, pp. 1029–1038 PMLR
  41. “Variational inference MPC for Bayesian model-based reinforcement learning” In Conference on Robot Learning, 2020, pp. 258–272 PMLR
  42. “Recent advances in robot learning from demonstration” In Annual review of control, robotics, and autonomous systems 3 Annual Reviews, 2020, pp. 297–330
  43. “Learning human objectives by evaluating hypothetical behavior” In International Conference on Machine Learning, 2020, pp. 8020–8029 PMLR
  44. J Anthony Rossiter “Model-based predictive control: a practical approach” CRC press, 2017
  45. “The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning” Springer-Verlag, 2004
  46. Reuven Rubinstein “The Cross-Entropy Method for Combinatorial and Continuous Optimization” In Methodology And Computing In Applied Probability, 1999
  47. “Active Preference-Based Learning of Reward Functions.” In Robotics: Science and Systems, 2017
  48. Daniel Shin, Anca Dragan and Daniel S Brown “Benchmarks and Algorithms for Offline Preference-Based Reward Learning” In Transactions on Machine Learning Research, 2023
  49. “Learning dynamic models for open loop predictive control of soft robotic manipulators” In Bioinspiration & biomimetics 12.6 IOP Publishing, 2017, pp. 066003
  50. “A Study of Causal Confusion in Preference-Based Reward Learning” In International Conference on Learning Representations, 2023
  51. Faraz Torabi, Garrett Warnell and Peter Stone “Behavioral cloning from observation” In Proceedings of the 27th International Joint Conference on Artificial Intelligence, 2018, pp. 4950–4957
  52. “Entity abstraction in visual model-based reinforcement learning” In Conference on Robot Learning, 2020, pp. 1439–1456 PMLR
  53. “Rough Terrain Navigation Using Divergence Constrained Model-Based Reinforcement Learning” In 5th Annual Conference on Robot Learning, 2021
  54. Grady Williams, Andrew Aldrich and Evangelos Theodorou “Model predictive path integral control using covariance variable importance sampling” In arXiv preprint arXiv:1509.01149, 2015
  55. Aaron Wilson, Alan Fern and Prasad Tadepalli “A Bayesian approach for policy learning from trajectory preference queries” In Advances in neural information processing systems 25, 2012
  56. “A survey of preference-based reinforcement learning methods” In Journal of Machine Learning Research 18.136 Journal of Machine Learning Research/Massachusetts Institute of Technology …, 2017, pp. 1–46
  57. “On learning from game annotations” In IEEE Transactions on Computational Intelligence and AI in Games 7.3 IEEE, 2014, pp. 304–316
  58. Christian Wirth, Johannes Fürnkranz and Gerhard Neumann “Model-free preference-based reinforcement learning” In Thirtieth AAAI Conference on Artificial Intelligence, 2016
  59. Alan Wu, AJ Piergiovanni and Michael S Ryoo “Model-based behavioral cloning with future image similarity learning” In Conference on Robot Learning, 2020, pp. 1062–1077 PMLR
  60. “Example-Driven Model-Based Reinforcement Learning for Solving Long-Horizon Visuomotor Tasks” In 5th Annual Conference on Robot Learning, 2021
  61. “Look Before You Leap: Safe Model-Based Reinforcement Learning with Human Intervention” In Conference on Robot Learning, 2022, pp. 332–341 PMLR
  62. “MOPO: Model-based offline policy optimization” In Advances in Neural Information Processing Systems 33, 2020, pp. 14129–14142
  63. “SOLAR: Deep structured representations for model-based reinforcement learning” In International Conference on Machine Learning, 2019, pp. 7444–7453 PMLR
  64. “Asynchronous Methods for Model-Based Reinforcement Learning” In Conference on Robot Learning, 2020, pp. 1338–1347 PMLR
  65. “An optimization approach to rough terrain locomotion” In 2010 IEEE International Conference on Robotics and Automation, 2010, pp. 3589–3595 IEEE
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yi Liu (543 papers)
  2. Gaurav Datta (5 papers)
  3. Ellen Novoseller (20 papers)
  4. Daniel S. Brown (46 papers)
Citations (16)