Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

StROL: Stabilized and Robust Online Learning from Humans (2308.09863v2)

Published 19 Aug 2023 in cs.RO

Abstract: Robots often need to learn the human's reward function online, during the current interaction. This real-time learning requires fast but approximate learning rules: when the human's behavior is noisy or suboptimal, current approximations can result in unstable robot learning. Accordingly, in this paper we seek to enhance the robustness and convergence properties of gradient descent learning rules when inferring the human's reward parameters. We model the robot's learning algorithm as a dynamical system over the human preference parameters, where the human's true (but unknown) preferences are the equilibrium point. This enables us to perform Lyapunov stability analysis to derive the conditions under which the robot's learning dynamics converge. Our proposed algorithm (StROL) uses these conditions to learn robust-by-design learning rules: given the original learning dynamics, StROL outputs a modified learning rule that now converges to the human's true parameters under a larger set of human inputs. In practice, these autonomously generated learning rules can correctly infer what the human is trying to convey, even when the human is noisy, biased, and suboptimal. Across simulations and a user study we find that StROL results in a more accurate estimate and less regret than state-of-the-art approaches for online reward learning. See videos and code here: https://github.com/VT-Collab/StROL_RAL

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. W. Jin, T. D. Murphey, Z. Lu, and S. Mou, “Learning from human directional corrections,” IEEE Transactions on Robotics, 2022.
  2. D. P. Losey and M. K. O’Malley, “Learning the correct robot trajectory in real-time from physical human interactions,” ACM Transactions on Human-Robot Interaction, vol. 9, no. 1, pp. 1–19, 2019.
  3. D. P. Losey, A. Bajcsy, M. K. O’Malley, and A. D. Dragan, “Physical interaction as communication: Learning robot objectives online from human corrections,” IJRR, vol. 41, no. 1, pp. 20–44, 2022.
  4. A. Jain, S. Sharma, T. Joachims, and A. Saxena, “Learning preferences for manipulation tasks from online coactive feedback,” IJRR, 2015.
  5. A. Bobu, A. Bajcsy, J. F. Fisac, S. Deglurkar, and A. D. Dragan, “Quantifying hypothesis space misspecification in learning from human–robot demonstrations and physical corrections,” IEEE Transactions on Robotics, vol. 36, no. 3, pp. 835–854, 2020.
  6. N. D. Ratliff, J. A. Bagnell, and M. A. Zinkevich, “Maximum margin planning,” in ICML, 2006, pp. 729–736.
  7. M. Hagenow, E. Senft, R. Radwin, M. Gleicher, B. Mutlu, and M. Zinn, “Corrective shared autonomy for addressing task variability,” IEEE Robotics and Automation Letters, 2021.
  8. J. Spencer, S. Choudhury, M. Barnes, M. Schmittle, M. Chiang, P. Ramadge, and S. Srinivasa, “Expert intervention learning: An online framework for robot learning from explicit and implicit human feedback,” Autonomous Robots, pp. 1–15, 2022.
  9. M. Tucker, E. Novoseller, C. Kann, Y. Sui, Y. Yue, J. W. Burdick, and A. D. Ames, “Preference-based learning for exoskeleton gait optimization,” in ICRA, 2020, pp. 2351–2357.
  10. K. Kronander and A. Billard, “Online learning of varying stiffness through physical human-robot interaction,” in ICRA, 2012.
  11. S. A. Mehta and D. P. Losey, “Unified learning from demonstrations, corrections, and preferences during physical human-robot interaction,” arXiv preprint arXiv:2207.03395, 2022.
  12. M. Khoramshahi and A. Billard, “A dynamical system approach to task-adaptation in physical human–robot interaction,” Autonomous Robots, vol. 43, pp. 927–946, 2019.
  13. Y. Li, G. Carboni, F. Gonzalez, D. Campolo, and E. Burdet, “Differential game theory for versatile physical human–robot interaction,” Nature Machine Intelligence, vol. 1, no. 1, pp. 36–43, 2019.
  14. A. Broad, I. Abraham, T. Murphey, and B. Argall, “Data-driven koopman operators for model-based shared control of human–machine systems,” IJRR, vol. 39, no. 9, pp. 1178–1195, 2020.
  15. R. Tian, M. Tomizuka, A. D. Dragan, and A. Bajcsy, “Towards modeling and influencing the dynamics of human learning,” in ACM/IEEE International Conference on Human-Robot Interaction, 2023.
  16. M. Saveriano, F. J. Abu-Dakka, A. Kramberger, and L. Peternel, “Dynamic movement primitives in robotics: A tutorial survey,” arXiv preprint arXiv:2102.03861, 2021.
  17. A. Singh, H. Liu, G. Zhou, A. Yu, N. Rhinehart, and S. Levine, “Parrot: Data-driven behavioral priors for reinforcement learning,” arXiv preprint arXiv:2011.10024, 2020.
  18. A. Rudenko, L. Palmieri, J. Doellinger, A. J. Lilienthal, and K. O. Arras, “Learning occupancy priors of human motion from semantic maps of urban environments,” IEEE RA-L, 2021.
  19. B. Zhang and H. Soh, “Large language models as zero-shot human models for human-robot interaction,” arXiv preprint arXiv:2303.03548, 2023.
  20. A. Bajcsy, A. Siththaranjan, C. J. Tomlin, and A. D. Dragan, “Analyzing human models that adapt online,” in ICRA, 2021.
  21. T. L. Griffiths, F. Lieder, and N. D. Goodman, “Rational use of cognitive resources: Levels of analysis between the computational and the algorithmic,” Topics in Cognitive Science, 2015.
  22. Z. Cao, E. Biyik, W. Z. Wang, A. Raventos, A. Gaidon, G. Rosman, and D. Sadigh, “Reinforcement learning based control of imitative policies for near-accident driving,” in RSS, July 2020.
  23. S. Jain and B. Argall, “Probabilistic human intent recognition for shared autonomy in assistive robotics,” ACM Transactions on Human-Robot Interaction (THRI), vol. 9, no. 1, pp. 1–23, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Shaunak A. Mehta (11 papers)
  2. Forrest Meng (1 paper)
  3. Andrea Bajcsy (36 papers)
  4. Dylan P. Losey (55 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com