Promptable Behaviors: Personalizing Multi-Objective Rewards from Human Preferences (2312.09337v1)
Abstract: Customizing robotic behaviors to be aligned with diverse human preferences is an underexplored challenge in the field of embodied AI. In this paper, we present Promptable Behaviors, a novel framework that facilitates efficient personalization of robotic agents to diverse human preferences in complex environments. We use multi-objective reinforcement learning to train a single policy adaptable to a broad spectrum of preferences. We introduce three distinct methods to infer human preferences by leveraging different types of interactions: (1) human demonstrations, (2) preference feedback on trajectory comparisons, and (3) language instructions. We evaluate the proposed method in personalized object-goal navigation and flee navigation tasks in ProcTHOR and RoboTHOR, demonstrating the ability to prompt agent behaviors to satisfy human preferences in various scenarios. Project page: https://promptable-behaviors.github.io
- A survey of inverse reinforcement learning. Artificial Intelligence Review, 55(6):4307–4346, 2022.
- Preference-based policy learning. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2011, Athens, Greece, September 5-9, 2011. Proceedings, Part I 11, pages 12–27. Springer, 2011.
- April: Active preference learning-based reinforcement learning. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2012, Bristol, UK, September 24-28, 2012. Proceedings, Part II 23, pages 116–131. Springer, 2012.
- Programming by feedback. In International Conference on Machine Learning, number 32, pages 1503–1511. JMLR. org, 2014.
- MO-Gym: A library of multi-objective reinforcement learning environments. In Proceedings of the 34th Benelux Conference on Artificial Intelligence BNAIC/Benelearn 2022, 2022.
- Sample-efficient multi-objective learning via generalized policy improvement prioritization. arXiv preprint arXiv:2301.07784, 2023.
- A survey of inverse reinforcement learning: Challenges, methods and progress. Artificial Intelligence, 297:103500, 2021.
- Batch active preference-based learning of reward functions. In Conference on robot learning (CoRL). PMLR, 2018.
- The green choice: Learning and influencing human decisions on shared roads. In 2019 IEEE 58th conference on decision and control (CDC). IEEE, 2019.
- Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):324–345, 1952.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Multi-objective bandits: Optimizing the generalized gini index. In International Conference on Machine Learning, pages 625–634. PMLR, 2017.
- Distributional pareto-optimal multi-objective reinforcement learning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Multi-objective deep reinforcement learning for crowd-aware robot navigation with dynamic human preference. Neural Computing and Applications, pages 1–19, 2023.
- Deep reinforcement learning from human preferences. Advances in neural information processing systems (NeurIPS), 2017.
- Active reward learning with a novel acquisition function. Autonomous Robots, 39:389–405, 2015.
- Robothor: An open simulation-to-real embodied ai platform. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020.
- Procthor: Large-scale embodied ai using procedural generation. Advances in Neural Information Processing Systems, 35, 2022.
- Navigation in urban environments amongst pedestrians using multi-objective deep reinforcement learning. In 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), pages 923–928. IEEE, 2021.
- Selective visual representations improve convergence and generalization for embodied ai. arXiv preprint arXiv:2311.04193, 2023.
- Score-based inverse reinforcement learning. 2016.
- Visual navigation using inverse reinforcement learning and an extreme learning machine. Electronics, 10(16):1997, 2021.
- Preference-based reinforcement learning: a formal framework and a policy iteration algorithm. Machine learning, 89:123–156, 2012.
- A practical guide to multi-objective reinforcement learning and planning. Autonomous Agents and Multi-Agent Systems, 36(1):26, 2022.
- Inverse preference learning: Preference-based rl without a reward function. arXiv preprint arXiv:2305.15363, 2023.
- Contrastive prefence learning: Learning from human feedback without rl. arXiv preprint arXiv:2310.13639, 2023.
- Few-shot preference learning for human-in-the-loop rl. In Conference on Robot Learning, pages 2014–2025. PMLR, 2023.
- Personalization and localization in human-robot interaction: A review of technical methods. Robotics, 10(4):120, 2021.
- Meta-explore: Exploratory hierarchical vision-and-language navigation using scene object spectrum grounding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023a.
- Sequential preference ranking for efficient reinforcement learning from human feedback. In Advances in Neural Information Processing Systems (NeurIPS), 2023b.
- Personalized soups: Personalized large language model alignment via post-hoc parameter merging. arXiv preprint arXiv:2310.11564, 2023.
- Simple but effective: Clip embeddings for embodied ai. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14829–14838, 2022.
- Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213, 2022.
- Ai2-thor: An interactive 3d environment for visual ai. arXiv preprint arXiv:1712.05474, 2017.
- Pebble: Feedback-efficient interactive reinforcement learning via relabeling experience and unsupervised pre-training. Proceedings of the International Conference on Machine Learning (ICML), 2021.
- Reward uncertainty for exploration in preference-based reinforcement learning. Proceedings of the International Conference on Learning Representations (ICLR), 2022.
- Meta-reward-net: Implicitly differentiable reward learning for preference-based reinforcement learning. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
- Multi-objective reinforcement learning: Convexity, stationarity and pareto optimality. In The Eleventh International Conference on Learning Representations, 2022.
- Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pages 1928–1937. PMLR, 2016.
- Multi-objective deep reinforcement learning. arXiv preprint arXiv:1610.02707, 2016.
- Learning multimodal rewards from rankings. In Conference on Robot Learning, pages 342–352. PMLR, 2022.
- OpenAI. ChatGPT. https://openai.com/blog/chatgpt, 2022.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
- Additional planning with multiple objectives for reinforcement learning. Knowledge-Based Systems, 193:105392, 2020.
- Surf: Semi-supervised reward learning with data augmentation for feedback-efficient preference-based reinforcement learning. Proceedings of the International Conference on Learning Representations (ICLR), 2022.
- Moral: Aligning ai with human norms through multi-objective reinforced active learning. arXiv preprint arXiv:2201.00012, 2021.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Habitat-web: Learning embodied object-search strategies from human demonstrations at scale. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5173–5183, 2022.
- Pirlnav: Pretraining with imitation and rl finetuning for objectnav. arXiv preprint arXiv:2301.07302, 2023.
- Robots that ask for help: Uncertainty alignment for large language model planners. arXiv preprint arXiv:2307.01928, 2023.
- Pareto conditioned networks. arXiv preprint arXiv:2204.05036, 2022.
- Diederik M Roijers. Multi-objective decision-theoretic planning. AI Matters, 2(4):11–12, 2016.
- Multi-objective reinforcement learning for the expected utility of the return. In Proceedings of the Adaptive and Learning Agents workshop at FAIM, 2018.
- Active preference-based learning of reward functions. In Proceedings of Robotics: Science and Systems (RSS), 2017.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Learning fair policies in multi-objective (deep) reinforcement learning with average and discounted rewards. In International Conference on Machine Learning, pages 8905–8915. PMLR, 2020.
- Reinforcement learning in robotic applications: a comprehensive survey. Artificial Intelligence Review, pages 1–46, 2022a.
- Ask4help: Learning to leverage an expert for embodied tasks. Advances in Neural Information Processing Systems, 35:16221–16232, 2022b.
- Preference-learning based inverse reinforcement learning for dialog control. In Thirteenth Annual Conference of the International Speech Communication Association, 2012.
- Multi-objective reinforcement learning using sets of pareto dominating policies. The Journal of Machine Learning Research, 15(1):3483–3512, 2014.
- Scalarized multi-objective reinforcement learning: Novel design techniques. In 2013 IEEE symposium on adaptive dynamic programming and reinforcement learning (ADPRL), pages 191–199. IEEE, 2013.
- Allenact: A framework for embodied ai research. arXiv preprint arXiv:2008.12760, 2020.
- John A Weymark. Generalized gini inequality indices. Mathematical Social Sciences, 1(4):409–430, 1981.
- Dd-ppo: Learning near-perfect pointgoal navigators from 2.5 billion frames. In arXiv preprint arXiv:1911.00357, 2019.
- A bayesian approach for policy learning from trajectory preference queries. Advances in neural information processing systems, 25, 2012.
- Model-free preference-based reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, 2016.
- A survey of preference-based reinforcement learning methods. Journal of Machine Learning Research, 18(136):1–46, 2017.
- Motion planning and control for mobile robot navigation using machine learning: a survey. Autonomous Robots, 46(5):569–597, 2022.
- Prediction-guided multi-objective reinforcement learning for continuous robot control. In International conference on machine learning, pages 10607–10616. PMLR, 2020.
- A generalized algorithm for multi-objective reinforcement learning and policy adaptation. Advances in neural information processing systems, 32, 2019.
- Soon: Scenario oriented object navigation with graph-based exploration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12689–12699, 2021.