Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Promptable Behaviors: Personalizing Multi-Objective Rewards from Human Preferences (2312.09337v1)

Published 14 Dec 2023 in cs.CV, cs.AI, and cs.RO

Abstract: Customizing robotic behaviors to be aligned with diverse human preferences is an underexplored challenge in the field of embodied AI. In this paper, we present Promptable Behaviors, a novel framework that facilitates efficient personalization of robotic agents to diverse human preferences in complex environments. We use multi-objective reinforcement learning to train a single policy adaptable to a broad spectrum of preferences. We introduce three distinct methods to infer human preferences by leveraging different types of interactions: (1) human demonstrations, (2) preference feedback on trajectory comparisons, and (3) language instructions. We evaluate the proposed method in personalized object-goal navigation and flee navigation tasks in ProcTHOR and RoboTHOR, demonstrating the ability to prompt agent behaviors to satisfy human preferences in various scenarios. Project page: https://promptable-behaviors.github.io

Definition Search Book Streamline Icon: https://streamlinehq.com
References (71)
  1. A survey of inverse reinforcement learning. Artificial Intelligence Review, 55(6):4307–4346, 2022.
  2. Preference-based policy learning. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2011, Athens, Greece, September 5-9, 2011. Proceedings, Part I 11, pages 12–27. Springer, 2011.
  3. April: Active preference learning-based reinforcement learning. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2012, Bristol, UK, September 24-28, 2012. Proceedings, Part II 23, pages 116–131. Springer, 2012.
  4. Programming by feedback. In International Conference on Machine Learning, number 32, pages 1503–1511. JMLR. org, 2014.
  5. MO-Gym: A library of multi-objective reinforcement learning environments. In Proceedings of the 34th Benelux Conference on Artificial Intelligence BNAIC/Benelearn 2022, 2022.
  6. Sample-efficient multi-objective learning via generalized policy improvement prioritization. arXiv preprint arXiv:2301.07784, 2023.
  7. A survey of inverse reinforcement learning: Challenges, methods and progress. Artificial Intelligence, 297:103500, 2021.
  8. Batch active preference-based learning of reward functions. In Conference on robot learning (CoRL). PMLR, 2018.
  9. The green choice: Learning and influencing human decisions on shared roads. In 2019 IEEE 58th conference on decision and control (CDC). IEEE, 2019.
  10. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):324–345, 1952.
  11. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  12. Multi-objective bandits: Optimizing the generalized gini index. In International Conference on Machine Learning, pages 625–634. PMLR, 2017.
  13. Distributional pareto-optimal multi-objective reinforcement learning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  14. Multi-objective deep reinforcement learning for crowd-aware robot navigation with dynamic human preference. Neural Computing and Applications, pages 1–19, 2023.
  15. Deep reinforcement learning from human preferences. Advances in neural information processing systems (NeurIPS), 2017.
  16. Active reward learning with a novel acquisition function. Autonomous Robots, 39:389–405, 2015.
  17. Robothor: An open simulation-to-real embodied ai platform. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020.
  18. Procthor: Large-scale embodied ai using procedural generation. Advances in Neural Information Processing Systems, 35, 2022.
  19. Navigation in urban environments amongst pedestrians using multi-objective deep reinforcement learning. In 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), pages 923–928. IEEE, 2021.
  20. Selective visual representations improve convergence and generalization for embodied ai. arXiv preprint arXiv:2311.04193, 2023.
  21. Score-based inverse reinforcement learning. 2016.
  22. Visual navigation using inverse reinforcement learning and an extreme learning machine. Electronics, 10(16):1997, 2021.
  23. Preference-based reinforcement learning: a formal framework and a policy iteration algorithm. Machine learning, 89:123–156, 2012.
  24. A practical guide to multi-objective reinforcement learning and planning. Autonomous Agents and Multi-Agent Systems, 36(1):26, 2022.
  25. Inverse preference learning: Preference-based rl without a reward function. arXiv preprint arXiv:2305.15363, 2023.
  26. Contrastive prefence learning: Learning from human feedback without rl. arXiv preprint arXiv:2310.13639, 2023.
  27. Few-shot preference learning for human-in-the-loop rl. In Conference on Robot Learning, pages 2014–2025. PMLR, 2023.
  28. Personalization and localization in human-robot interaction: A review of technical methods. Robotics, 10(4):120, 2021.
  29. Meta-explore: Exploratory hierarchical vision-and-language navigation using scene object spectrum grounding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023a.
  30. Sequential preference ranking for efficient reinforcement learning from human feedback. In Advances in Neural Information Processing Systems (NeurIPS), 2023b.
  31. Personalized soups: Personalized large language model alignment via post-hoc parameter merging. arXiv preprint arXiv:2310.11564, 2023.
  32. Simple but effective: Clip embeddings for embodied ai. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14829–14838, 2022.
  33. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213, 2022.
  34. Ai2-thor: An interactive 3d environment for visual ai. arXiv preprint arXiv:1712.05474, 2017.
  35. Pebble: Feedback-efficient interactive reinforcement learning via relabeling experience and unsupervised pre-training. Proceedings of the International Conference on Machine Learning (ICML), 2021.
  36. Reward uncertainty for exploration in preference-based reinforcement learning. Proceedings of the International Conference on Learning Representations (ICLR), 2022.
  37. Meta-reward-net: Implicitly differentiable reward learning for preference-based reinforcement learning. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
  38. Multi-objective reinforcement learning: Convexity, stationarity and pareto optimality. In The Eleventh International Conference on Learning Representations, 2022.
  39. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pages 1928–1937. PMLR, 2016.
  40. Multi-objective deep reinforcement learning. arXiv preprint arXiv:1610.02707, 2016.
  41. Learning multimodal rewards from rankings. In Conference on Robot Learning, pages 342–352. PMLR, 2022.
  42. OpenAI. ChatGPT. https://openai.com/blog/chatgpt, 2022.
  43. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  44. Additional planning with multiple objectives for reinforcement learning. Knowledge-Based Systems, 193:105392, 2020.
  45. Surf: Semi-supervised reward learning with data augmentation for feedback-efficient preference-based reinforcement learning. Proceedings of the International Conference on Learning Representations (ICLR), 2022.
  46. Moral: Aligning ai with human norms through multi-objective reinforced active learning. arXiv preprint arXiv:2201.00012, 2021.
  47. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  48. Habitat-web: Learning embodied object-search strategies from human demonstrations at scale. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5173–5183, 2022.
  49. Pirlnav: Pretraining with imitation and rl finetuning for objectnav. arXiv preprint arXiv:2301.07302, 2023.
  50. Robots that ask for help: Uncertainty alignment for large language model planners. arXiv preprint arXiv:2307.01928, 2023.
  51. Pareto conditioned networks. arXiv preprint arXiv:2204.05036, 2022.
  52. Diederik M Roijers. Multi-objective decision-theoretic planning. AI Matters, 2(4):11–12, 2016.
  53. Multi-objective reinforcement learning for the expected utility of the return. In Proceedings of the Adaptive and Learning Agents workshop at FAIM, 2018.
  54. Active preference-based learning of reward functions. In Proceedings of Robotics: Science and Systems (RSS), 2017.
  55. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  56. Learning fair policies in multi-objective (deep) reinforcement learning with average and discounted rewards. In International Conference on Machine Learning, pages 8905–8915. PMLR, 2020.
  57. Reinforcement learning in robotic applications: a comprehensive survey. Artificial Intelligence Review, pages 1–46, 2022a.
  58. Ask4help: Learning to leverage an expert for embodied tasks. Advances in Neural Information Processing Systems, 35:16221–16232, 2022b.
  59. Preference-learning based inverse reinforcement learning for dialog control. In Thirteenth Annual Conference of the International Speech Communication Association, 2012.
  60. Multi-objective reinforcement learning using sets of pareto dominating policies. The Journal of Machine Learning Research, 15(1):3483–3512, 2014.
  61. Scalarized multi-objective reinforcement learning: Novel design techniques. In 2013 IEEE symposium on adaptive dynamic programming and reinforcement learning (ADPRL), pages 191–199. IEEE, 2013.
  62. Allenact: A framework for embodied ai research. arXiv preprint arXiv:2008.12760, 2020.
  63. John A Weymark. Generalized gini inequality indices. Mathematical Social Sciences, 1(4):409–430, 1981.
  64. Dd-ppo: Learning near-perfect pointgoal navigators from 2.5 billion frames. In arXiv preprint arXiv:1911.00357, 2019.
  65. A bayesian approach for policy learning from trajectory preference queries. Advances in neural information processing systems, 25, 2012.
  66. Model-free preference-based reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, 2016.
  67. A survey of preference-based reinforcement learning methods. Journal of Machine Learning Research, 18(136):1–46, 2017.
  68. Motion planning and control for mobile robot navigation using machine learning: a survey. Autonomous Robots, 46(5):569–597, 2022.
  69. Prediction-guided multi-objective reinforcement learning for continuous robot control. In International conference on machine learning, pages 10607–10616. PMLR, 2020.
  70. A generalized algorithm for multi-objective reinforcement learning and policy adaptation. Advances in neural information processing systems, 32, 2019.
  71. Soon: Scenario oriented object navigation with graph-based exploration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12689–12699, 2021.
Citations (10)

Summary

We haven't generated a summary for this paper yet.