Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning Human Preferences Over Robot Behavior as Soft Planning Constraints (2403.19795v1)

Published 28 Mar 2024 in cs.RO

Abstract: Preference learning has long been studied in Human-Robot Interaction (HRI) in order to adapt robot behavior to specific user needs and desires. Typically, human preferences are modeled as a scalar function; however, such a formulation confounds critical considerations on how the robot should behave for a given task, with desired -- but not required -- robot behavior. In this work, we distinguish between such required and desired robot behavior by leveraging a planning framework. Specifically, we propose a novel problem formulation for preference learning in HRI where various types of human preferences are encoded as soft planning constraints. Then, we explore a data-driven method to enable a robot to infer preferences by querying users, which we instantiate in rearrangement tasks in the Habitat 2.0 simulator. We show that the proposed approach is promising at inferring three types of preferences even under varying levels of noise in simulated user choices between potential robot behaviors. Our contributions open up doors to adaptable planning-based robot behavior in the future.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. A. Billard, S. Calinon, R. Dillmann, and S. Schaal, “Survey: Robot programming by demonstration,” Springer handbook of robotics, 2008.
  2. A. Thomaz, G. Hoffman, M. Cakmak et al., “Computational human-robot interaction,” Found. Trends Mach. Learn., 2016.
  3. H. Ravichandar, A. S. Polydoros, S. Chernova, and A. Billard, “Recent advances in robot learning from demonstration,” Annu Rev Control Robot Auton Syst., 2020.
  4. E. Bıyık, A. Talati, and D. Sadigh, “Aprel: A library for active preference-based reward learning algorithms,” in HRI, 2022.
  5. C. Wirth, R. Akrour, G. Neumann, J. Fürnkranz et al., “A survey of preference-based reinforcement learning methods,” JMLR, 2017.
  6. A. Jorge, S. A. McIlraith et al., “Planning with preferences,” AI Magazine, 2008.
  7. A. Szot, A. Clegg, E. Undersander, E. Wijmans, Y. Zhao, J. Turner, N. Maestre, M. Mukadam, D. Chaplot, O. Maksymets, A. Gokaslan, V. Vondrus, S. Dharur, F. Meier, W. Galuba, A. Chang, Z. Kira, V. Koltun, J. Malik, M. Savva, and D. Batra, “Habitat 2.0: Training home assistants to rearrange their habitat,” in NeurIPS, 2021.
  8. D. Sadigh, A. Dragan, S. Sastry, and S. Seshia, “Active preference-based learning of reward functions,” in RSS, 2017.
  9. H. J. Jeon, S. Milli, and A. Dragan, “Reward-rational (implicit) choice: A unifying formalism for reward learning,” NeurIPS, 2020.
  10. T. Fitzgerald, P. Koppol, P. Callaghan, R. Q. J. H. Wong, R. Simmons, O. Kroemer, and H. Admoni, “Inquire: Interactive querying for user-aware informative reasoning,” in CoRL, 2022.
  11. D. J. Hejna III and D. Sadigh, “Few-shot preference learning for human-in-the-loop rl,” in CoRL, 2023.
  12. M. Palan, G. Shevchuk, N. C. Landolfi, and D. Sadigh, “Learning reward functions by integrating human demonstrations and preferences,” in RSS, 2019.
  13. P. Koppol, H. Admoni, and R. G. Simmons, “Interaction considerations in learning from humans.” in IJCAI, 2021, pp. 283–291.
  14. J. Fürnkranz and E. Hüllermeier, “Preference learning and ranking by pairwise comparison,” in Preference learning.   Springer, 2010, pp. 65–82.
  15. N. Ailon, “An active learning algorithm for ranking from pairwise preferences with an almost optimal query complexity.” JMLR, 2012.
  16. E. Bıyık, M. Palan, N. C. Landolfi, D. P. Losey, and D. Sadigh, “Asking easy questions: A user-friendly approach to active reward learning,” arXiv:1910.04365, 2019.
  17. A. Wilson, A. Fern, and P. Tadepalli, “A bayesian approach for policy learning from trajectory preference queries,” NeurIPS, 2012.
  18. E. Biyik and D. Sadigh, “Batch active preference-based learning of reward functions,” in CoRL, 2018.
  19. P. F. Christiano, J. Leike, T. Brown, M. Martic, S. Legg, and D. Amodei, “Deep reinforcement learning from human preferences,” NeurIPS, 2017.
  20. J. Lin, Z. Ma, R. Gomez, K. Nakamura, B. He, and G. Li, “A review on interactive reinforcement learning from human social feedback,” IEEE Access, vol. 8, 2020.
  21. B. D. Ziebart, J. A. Bagnell, and A. K. Dey, “Modeling interaction via the principle of maximum causal entropy,” in ICML, 2010.
  22. D. Sadigh, N. Landolfi, S. S. Sastry, S. A. Seshia, and A. D. Dragan, “Planning for cars that coordinate with people: leveraging effects on human actions for planning and active information gathering over human internal state,” Auton. Robots, 2018.
  23. S. Mantik, M. Li, and J. Porteous, “A preference elicitation framework for automated planning,” Expert Systems with Applications, vol. 208, p. 118014, 2022.
  24. A. Saran, E. S. Short, A. Thomaz, and S. Niekum, “Enhancing robot learning with human social cues,” in HRI, 2019.
  25. H. Admoni and B. Scassellati, “Social eye gaze in human-robot interaction: a review,” JHRI, 2017.
  26. Y. Cui, Q. Zhang, B. Knox, A. Allievi, P. Stone, and S. Niekum, “The empathic framework for task learning from implicit human feedback,” in Conf. on Robot Learning.   PMLR, 2021.
  27. K. Candon, N. C. Georgiou, H. Zhou, S. Richardson, Q. Zhang, B. Scassellati, and M. Vázquez, “React: Two datasets for analyzing both human reactions and evaluative feedback to robots over time,” in HRI, 2024.
  28. M. Cakmak, C. Chao, and A. L. Thomaz, “Designing interactions for robot active learners,” IEEE T Auton Ment De, 2010.
  29. A. T. Taylor, T. A. Berrueta, and T. D. Murphey, “Active learning in robotics: A review of control principles,” Mechatronics, vol. 77, 2021.
  30. A. Gerevini and D. Long, “Plan constraints and preferences in pddl3,” Dept Elect Autom, U Brescia, Tech. Rep., 2005.
  31. “BNF definition of PDDL 3.1,” https://helios.hud.ac.uk/scommv/IPC-14/repository/kovacs-pddl-3.1-2011.pdf, 2011.
  32. H.-T. L. Chiang, J. Hsu, M. Fiser, L. Tapia, and A. Faust, “Rl-rrt: Kinodynamic motion planning via learning reachability estimators from rl policies,” RA-L, 2019.
  33. A. Pokle, R. Martín-Martín, P. Goebel, V. Chow, H. M. Ewald, J. Yang, Z. Wang, A. Sadeghian, D. Sadigh, S. Savarese et al., “Deep local trajectory replanning and control for robot navigation,” in ICRA, 2019.
  34. M. A. Lee, C. Florensa, J. Tremblay, N. Ratliff, A. Garg, F. Ramos, and D. Fox, “Guided uncertainty-aware policy optimization: Combining learning and model-based strategies for sample-efficient policy learning,” in ICRA, 2020.
  35. A. Graves, G. Wayne, and I. Danihelka, “Neural turing machines,” arXiv:1410.5401, 2014.
  36. E. Parisotto, A.-r. Mohamed, R. Singh, L. Li, D. Zhou, and P. Kohli, “Neuro-symbolic program synthesis,” arXiv:1611.01855, 2016.
  37. X. Chen, D. Song, and Y. Tian, “Latent execution for neural program synthesis beyond domain-specific languages,” NeurIPS, 2021.
  38. N. Simon and C. Muise, “A natural language model for generating pddl,” in KEPS, 2021.
  39. B. Liu, Y. Jiang, X. Zhang, Q. Liu, S. Zhang, J. Biswas, and P. Stone, “Llm+ p: Empowering large language models with optimal planning proficiency,” arXiv:2304.11477, 2023.
  40. M. Leonetti, L. Iocchi, and P. Stone, “A synthesis of automated planning and reinforcement learning for efficient, robust decision-making,” Artif Intell, 2016.
  41. C. Kim, J. Park, J. Shin, H. Lee, P. Abbeel, and K. Lee, “Preference transformer: Modeling human preferences using transformers for RL,” in ICLR, 2023.
  42. C. DeChant, I. Akinola, and D. Bauer, “Learning to summarize and answer questions about a virtual robot’s past actions,” arXiv:2306.09922, 2023.
  43. M. Ahn, A. Brohan, N. Brown, Y. Chebotar, O. Cortes, B. David, C. Finn, C. Fu, K. Gopalakrishnan, K. Hausman et al., “Do as i can, not as i say: Grounding language in robotic affordances,” arXiv:2204.01691, 2022.
  44. C. Paxton, Y. Bisk, J. Thomason, A. Byravan, and D. Foxl, “Prospection: Interpretable plans from language by predicting the future,” in ICRA, 2019.
  45. A. E. Gerevini, P. Haslum, D. Long, A. Saetti, and Y. Dimopoulos, “Deterministic planning in the fifth international planning competition: Pddl3 and experimental evaluation of the planners,” Artif Intell, 2009.
  46. H. A. Simon, “A behavioral model of rational choice,” QJE, 1955.
  47. K. Lee, L. Smith, A. Dragan, and P. Abbeel, “B-pref: Benchmarking preference-based reinforcement learning,” arXiv:2111.03026, 2021.
  48. L. Chan, A. Critch, and A. Dragan, “Human irrationality: both bad and good for reward inference,” arXiv:2111.06956, 2021.
  49. D. L. Kovács, “A multi-agent extension of pddl3. 1,” 2012.
  50. J. Benton, A. Coles, and A. Coles, “Temporal planning with preferences and time-dependent continuous costs,” in ICAPS, 2012.
  51. P. Barnett, R. Freedman, J. Svegliato, and S. Russell, “Active reward learning from multiple teachers,” arXiv preprint arXiv:2303.00894, 2023.
  52. J. Schmidhuber, S. Hochreiter et al., “Long short-term memory,” Neural Comput, 1997.
  53. I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv:1711.05101, 2017.
  54. L. Prechelt, “Early stopping-but when?” in Neural Networks: Tricks of the trade.   Springer, 2002, pp. 55–69.
  55. H. A. Simon, “Bounded rationality,” Utility and probability, pp. 15–18, 1990.
  56. J. Brawer, D. Ghose, K. Candon, M. Qin, A. Roncone, M. Vázquez, and B. Scassellati, “Interactive policy shaping for human-robot collaboration with transparent matrix overlays,” in HRI, 2023.
Citations (2)

Summary

We haven't generated a summary for this paper yet.