A Generalized Acquisition Function for Preference-based Reward Learning (2403.06003v1)
Abstract: Preference-based reward learning is a popular technique for teaching robots and autonomous systems how a human user wants them to perform a task. Previous works have shown that actively synthesizing preference queries to maximize information gain about the reward function parameters improves data efficiency. The information gain criterion focuses on precisely identifying all parameters of the reward function. This can potentially be wasteful as many parameters may result in the same reward, and many rewards may result in the same behavior in the downstream tasks. Instead, we show that it is possible to optimize for learning the reward function up to a behavioral equivalence class, such as inducing the same ranking over behaviors, distribution over choices, or other related definitions of what makes two rewards similar. We introduce a tractable framework that can capture such definitions of similarity. Our experiments in a synthetic environment, an assistive robotics environment with domain transfer, and a natural language processing problem with real datasets demonstrate the superior performance of our querying method over the state-of-the-art information gain method.
- “Defining and characterizing reward gaming” In Advances in Neural Information Processing Systems 35, 2022, pp. 9460–9471
- “Active Preference-Based Learning of Reward Functions” In Proceedings of Robotics: Science and Systems (RSS), 2017 DOI: 10.15607/RSS.2017.XIII.053
- “Asking Easy Questions: A User-Friendly Approach to Active Reward Learning” In Proceedings of the 3rd Conference on Robot Learning (CoRL), 2019
- Andrew Y Ng, Daishi Harada and Stuart Russell “Policy invariance under reward transformations: Theory and application to reward shaping” In Icml 99, 1999, pp. 278–287 Citeseer
- “Invariance in Policy Optimisation and Partial Identifiability in Reward Learning”, 2023 arXiv:2203.07475 [cs.LG]
- Erik Jenner, Joar Max Viktor Skalse and Adam Gleave “A general framework for reward function distances” In NeurIPS ML Safety Workshop, 2022 URL: https://openreview.net/forum?id=Hn21kZHiCK
- “Quantifying Differences in Reward Functions” In International Conference on Learning Representations, 2021 URL: https://openreview.net/forum?id=LwEQnp6CYev
- Pieter Abbeel and Andrew Y Ng “Apprenticeship learning via inverse reinforcement learning” In Proceedings of the twenty-first international conference on Machine learning, 2004, pp. 1
- “Deep reinforcement learning from human preferences” In Advances in neural information processing systems 30, 2017
- “Learning Multimodal Rewards from Rankings” In Proceedings of the 5th Conference on Robot Learning (CoRL), 2021
- “Learning robot objectives from physical human interaction” In Conference on Robot Learning, 2017, pp. 217–226 PMLR
- “Learning from physical human corrections, one feature at a time” In Proceedings of the 2018 ACM/IEEE International Conference on Human-Robot Interaction, 2018, pp. 141–149
- “The off-switch game” In Workshops at the Thirty-First AAAI Conference on Artificial Intelligence, 2017
- David JC MacKay “Information-based objective functions for active data selection” In Neural computation 4.4 MIT Press One Rogers Street, Cambridge, MA 02142-1209, USA journals-info …, 1992, pp. 590–604
- Nils Wilde, Dana Kulić and Stephen L Smith “Active preference learning using maximum regret” In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 10952–10959 IEEE
- “Information Directed Reward Learning for Reinforcement Learning”, 2022 arXiv:2102.12466 [cs.LG]
- “RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback”, 2024 arXiv:2402.03681 [cs.RO]
- “Dynamics-Aware Comparison of Learned Reward Functions”, 2022 arXiv:2201.10081 [cs.LG]
- “Efficient Exploration of Reward Functions in Inverse Reinforcement Learning via Bayesian Optimization”, 2020 arXiv:2011.08541 [cs.LG]
- Hong Jun Jeon, Smitha Milli and Anca Dragan “Reward-rational (implicit) choice: A unifying formalism for reward learning” In Advances in Neural Information Processing Systems 33, 2020, pp. 4415–4426
- “Extrapolating beyond suboptimal demonstrations via inverse reinforcement learning from observations” In International conference on machine learning, 2019, pp. 783–792 PMLR
- “Gaussian Processes for Ordinal Regression” In Journal of Machine Learning Research 6 Citeseer, 2005, pp. 1–48
- “Roial: Region of interest active learning for characterizing exoskeleton gait preference landscapes” In 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 3212–3218 IEEE
- “Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences” In The International Journal of Robotics Research 41.1 SAGE Publications Sage UK: London, England, 2022, pp. 45–67
- “Active Preference-Based Gaussian Process Regression for Reward Learning” In Proceedings of Robotics: Science and Systems (RSS), 2020 DOI: 10.15607/rss.2020.xvi.041
- “Active preference-based Gaussian process regression for reward learning and optimization” In The International Journal of Robotics Research SAGE Publications Sage UK: London, England, 2023, pp. 02783649231208729
- Andrew Y Ng and Stuart Russell “Algorithms for inverse reinforcement learning.” In Icml 1, 2000, pp. 2
- “Learning Reward Functions from Scale Feedback” In Proceedings of the 5th Conference on Robot Learning (CoRL), 2021
- “Adaptive submodularity: Theory and applications in active learning and stochastic optimization” In Journal of Artificial Intelligence Research 42, 2011, pp. 427–486
- Daniel Golovin, Andreas Krause and Debajyoti Ray “Near-optimal bayesian active learning with noisy observations” In Advances in Neural Information Processing Systems 23, 2010
- Clayton Scott Gowtham Bellala “Modified Group Generalized Binary Search with Near-Optimal Performance Guarantees”, 2010
- “Assistive Gym: A Physics Simulation Framework for Assistive Robotics”, 2019 arXiv:1910.04700 [cs.RO]
- Kawin Ethayarajh, Yejin Choi and Swabha Swayamdipta “Understanding Dataset Difficulty with 𝒱𝒱\mathcal{V}caligraphic_V-Usable Information” In International Conference on Machine Learning, 2022, pp. 5988–6008 PMLR
- “Training language models to follow instructions with human feedback” In Advances in Neural Information Processing Systems 35, 2022, pp. 27730–27744
- “Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback” In Transactions on Machine Learning Research (TMLR), 2023
- “TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification” In Findings of the Association for Computational Linguistics: EMNLP 2020, 2020, pp. 1644–1650
- Rudolf Franz Flesch and Alan J Gould “The art of readable writing”, 1949
- Edgar Dale and Jeanne S Chall “A formula for predicting readability: Instructions” In Educational research bulletin JSTOR, 1948, pp. 37–54
- Meri Coleman and Ta Lin Liau “A computer readability formula designed for machine scoring.” In Journal of Applied Psychology 60.2 American Psychological Association, 1975, pp. 283
- RJ Senter and Edgar A Smith “Automated readability index”, 1967
- “Generalizing discriminative retrieval models using generative tasks” In Proceedings of the Web Conference 2021, 2021, pp. 3745–3756