Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Unified Learning from Demonstrations, Corrections, and Preferences during Physical Human-Robot Interaction (2207.03395v2)

Published 7 Jul 2022 in cs.RO

Abstract: Humans can leverage physical interaction to teach robot arms. This physical interaction takes multiple forms depending on the task, the user, and what the robot has learned so far. State-of-the-art approaches focus on learning from a single modality, or combine multiple interaction types by assuming that the robot has prior information about the human's intended task. By contrast, in this paper we introduce an algorithmic formalism that unites learning from demonstrations, corrections, and preferences. Our approach makes no assumptions about the tasks the human wants to teach the robot; instead, we learn a reward model from scratch by comparing the human's inputs to nearby alternatives. We first derive a loss function that trains an ensemble of reward models to match the human's demonstrations, corrections, and preferences. The type and order of feedback is up to the human teacher: we enable the robot to collect this feedback passively or actively. We then apply constrained optimization to convert our learned reward into a desired robot trajectory. Through simulations and a user study we demonstrate that our proposed approach more accurately learns manipulation tasks from physical human interaction than existing baselines, particularly when the robot is faced with new or unexpected objectives. Videos of our user study are available at: https://youtu.be/FSUJsTYvEKU

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Pieter Abbeel and Andrew Y Ng. 2004. Apprenticeship learning via inverse reinforcement learning. In International Conference on Machine Learning.
  2. Keyframe-based learning from demonstration. International Journal of Social Robotics 4, 4 (2012), 343–355.
  3. A survey of robot learning from demonstration. Robotics and Autonomous Systems 57, 5 (2009), 469–483.
  4. Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences. The International Journal of Robotics Research (2021).
  5. Quantifying hypothesis space misspecification in learning from human–robot demonstrations and physical corrections. IEEE Transactions on Robotics 36, 3 (2020), 835–854.
  6. Inducing Structure in Reward Learning by Learning Features. The International Journal of Robotics Research (2022).
  7. Extrapolating beyond suboptimal demonstrations via inverse reinforcement learning from observations. In International Conference on Machine Learning. 783–792.
  8. Better-than-demonstrator imitation learning via automatically-ranked demonstrations. In Conference on robot learning. PMLR, 330–359.
  9. Learning from suboptimal demonstration via self-supervised reward regression. In Conference on Robot Learning. 1262–1277.
  10. Deep reinforcement learning from human preferences. In Advances in Neural Information Processing Systems.
  11. Thomas M Cover. 1999. Elements of Information Theory. John Wiley & Sons.
  12. An atlas of physical human–robot interaction. Mechanism and Machine Theory 43, 3 (2008), 253–270.
  13. Movement primitives via optimization. In IEEE International Conference on Robotics and Automation. 2339–2346.
  14. Learning robust rewards with adversarial inverse reinforcement learning. In International Conference on Learning Representations.
  15. SNOPT: An SQP algorithm for large-scale constrained optimization. SIAM review 47, 1 (2005), 99–131.
  16. Collision detection and reaction: A contribution to safe physical human-robot interaction. In IEEE/RSJ International Conference on Intelligent Robots and Systems. 3356–3363.
  17. Sami Haddadin and Elizabeth Croft. 2016. Physical human–robot interaction. In Springer Handbook of Robotics. 1835–1874.
  18. Corrective shared autonomy for addressing task variability. IEEE Robotics and Automation Letters 6, 2 (2021), 3720–3727.
  19. Neville Hogan. 1984. Impedance control: An approach to manipulation. In American Control Conference. 304–313.
  20. ThriftyDAgger: Budget-aware novelty and risk gating for interactive imitation learning. In Conference on Robot Learning.
  21. ALTRO: A fast solver for constrained trajectory optimization. In IEEE/RSJ International Conference on Intelligent Robots and Systems. 7674–7679.
  22. Reward learning from human preferences and demonstrations in Atari. In Advances in Neural Information Processing Systems.
  23. Learning preferences for manipulation tasks from online coactive feedback. The International Journal of Robotics Research 34, 10 (2015), 1296–1313.
  24. Reward-rational (implicit) choice: A unifying formalism for reward learning. In Advances in Neural Information Processing Systems. 4415–4426.
  25. Learning objective functions for manipulation. In IEEE International Conference on Robotics and Automation. 1331–1336.
  26. HG-DAgger: Interactive imitation learning with human experts. In International Conference on Robotics and Automation. 8077–8083.
  27. Mahdi Khoramshahi and Aude Billard. 2019. A dynamical system approach to task-adaptation in physical human–robot interaction. Autonomous Robots 43, 4 (2019), 927–946.
  28. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
  29. PEBBLE: Feedback-efficient interactive reinforcement learning via relabeling experience and unsupervised pre-training. In International Conference on Machine Learning. 6152–6163.
  30. Learning human objectives from sequences of physical corrections. In IEEE International Conference on Robotics and Automation. 2877–2883.
  31. Differential game theory for versatile physical human–robot interaction. Nature Machine Intelligence 1, 1 (2019), 36–43.
  32. Physical interaction as communication: Learning robot objectives online from human corrections. The International Journal of Robotics Research (2021).
  33. A review of intent detection, arbitration, and communication aspects of shared control for physical human–robot interaction. Applied Mechanics Reviews 70, 1 (2018).
  34. Dylan P Losey and Marcia K O’Malley. 2017. Trajectory deformations from physical human–robot interaction. IEEE Transactions on Robotics 34, 1 (2017), 126–138.
  35. Dylan P Losey and Marcia K O’Malley. 2019. Learning the correct robot trajectory in real-time from physical human interactions. ACM Transactions on Human-Robot Interaction 9, 1 (2019), 1–19.
  36. A rational model of preference learning and choice prediction by children. In Advances in Neural Information Processing Systems.
  37. R Duncan Luce. 2012. Individual Choice Behavior: A Theoretical Analysis. Courier Corporation.
  38. The role of roles: Physical cooperation between humans and robots. The International Journal of Robotics Research 31, 13 (2012), 1656–1674.
  39. Selma Musić and Sandra Hirche. 2017. Control sharing in human-robot team interaction. Annual Reviews in Control 44 (2017), 342–354.
  40. Andrew Y Ng and Stuart J Russell. 2000. Algorithms for inverse reinforcement learning. In International Conference on Machine Learning.
  41. An algorithmic perspective on imitation learning. Foundations and Trends in Robotics 7, 1-2 (2018), 1–179.
  42. A reduction of imitation learning and structured prediction to no-regret online learning. In International Conference on Artificial Intelligence and Statistics. 627–635.
  43. Mind meld: Personalized meta-learning for robot-centric imitation learning. In 2022 17th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 157–165.
  44. Four years in review: Statistical practices of likert scales in human-robot interaction studies. In Companion of the 2020 ACM/IEEE International Conference on Human-Robot Interaction. 43–52.
  45. Motion planning with sequential convex optimization and convex collision checking. The International Journal of Robotics Research 33, 9 (2014), 1251–1270.
  46. Roger N Shepard. 1957. Stimulus and response generalization: A stochastic model relating generalization to distance in psychological space. Psychometrika 22, 4 (1957), 325–345.
  47. Expert intervention learning. Autonomous Robots 46, 1 (2022), 99–113.
  48. An ensemble inverse optimal control approach for robotic task learning and adaptation. Autonomous Robots 43, 4 (2019), 875–896.
  49. Confidence-aware imitation learning from demonstrations with varying optimality. In Advances in Neural Information Processing Systems.
  50. Maximum entropy inverse reinforcement learning. In AAAI. 1433–1438.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Shaunak A. Mehta (11 papers)
  2. Dylan P. Losey (55 papers)
Citations (16)