Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Survey of Reinforcement Learning from Human Feedback (2312.14925v2)

Published 22 Dec 2023 in cs.LG
A Survey of Reinforcement Learning from Human Feedback

Abstract: Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning (RL) that learns from human feedback instead of relying on an engineered reward function. Building on prior work on the related setting of preference-based reinforcement learning (PbRL), it stands at the intersection of artificial intelligence and human-computer interaction. This positioning offers a promising avenue to enhance the performance and adaptability of intelligent systems while also improving the alignment of their objectives with human values. The training of LLMs has impressively demonstrated this potential in recent years, where RLHF played a decisive role in directing the model's capabilities toward human objectives. This article provides a comprehensive overview of the fundamentals of RLHF, exploring the intricate dynamics between RL agents and human input. While recent focus has been on RLHF for LLMs, our survey adopts a broader perspective, examining the diverse applications and wide-ranging impact of the technique. We delve into the core principles that underpin RLHF, shedding light on the symbiotic relationship between algorithms and human feedback, and discuss the main research trends in the field. By synthesizing the current landscape of RLHF research, this article aims to provide researchers as well as practitioners with a comprehensive understanding of this rapidly growing field of research.

Summary of Reinforcement Learning from Human Feedback (RLHF)

Introduction

Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning (RL) that focuses on learning behavioral models directly from human-generated feedback, replacing traditional, engineered reward functions. This crossover field integrates AI and human-computer interaction, aiming to improve the alignment of agent objectives with human preferences and values. The approach is exemplified by its applications in training LLMs through human-aligned objectives.

Feedback Mechanisms

In RLHF, feedback types vary in their information content and complexity. Attributes determining a feedback type's classification include arity (unary, binary, n-ary), involvement (passive, active, co-generative), and intent (evaluative, instructive, descriptive, literal). While binary comparisons and rankings are common forms of feedback, other methods, such as critique, importance indicators, and corrections, offer additional mechanisms for preference expression. Interaction methods like emergency stops and feature traces also present alternative feedback modalities.

Active Learning and Label Collection

Active learning techniques are critical for efficient RLHF, as they enable selective querying of human feedback. These methods prioritize queries based on factors such as uncertainty, query simplicity, trajectory quality, and human labeler reliability. Additionally, psychological considerations, including biases and the relationship between researcher goals and labeler responses, significantly impact the effectiveness of preference elicitation. Understanding human psychology aids in designing interactions that facilitate informative query responses.

Reward Model Training

Training a reward model in RLHF involves various components such as selecting an appropriate human feedback model, learning utilities based on feedback, and evaluating learned reward functions. Approaches range from empirical risk minimization to Bayesian methods, and incorporate features like human-specific rationality coefficients and alternative utility notions.

Increasing Feedback Efficiency

Improving feedback efficiency is crucial for RLHF. This objective can be achieved through techniques like leveraging foundation models, meta- and transfer learning for reward model initialization, as well as self-supervised and semi-supervised training. Data augmentation and actively generating informative experiences further enhance learning efficiency.

Benchmarks and Evaluation

Evaluating RLHF approaches is challenging due to the involvement of human feedback and the absence of clear ground-truth task specifications. Benchmarks like B-Pref and MineRL BASALT offer standardized means to measure performance, addressing issues in reward learning evaluation. Libraries like imitation, APReL, and POLAR provide foundational tools for RLHF research, facilitating experimentation with various methods.

Discussion and Future Directions

The field of RLHF is growing rapidly, exploring new methods and addressing challenges such as the incorporation of offline preference-based reward learning and more complex objective functions. Benchmarks and frameworks that facilitate research in this area are continuously evolving, paving the way for methodologies that manage human feedback's complexity and variability effectively. With advancements in theory and practice, promising prospects for further robust algorithms and efficient use of human feedback lie ahead.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (296)
  1. Youssef Abdelkareem, Shady Shehata and Fakhri Karray “Advances in Preference-based Reinforcement Learning: A Review” In Proceedings of IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2022, pp. 2527–2532 DOI: 10.1109/SMC53654.2022.9945333
  2. “Improving Multimodal Interactive Agents with Reinforcement Learning from Human Feedback”, 2022 arXiv:2211.11602
  3. Riad Akrour, Marc Schoenauer and Michele Sebag “Preference-Based Policy Learning” In Proceedings of Machine Learning and Knowledge Discovery in Databases (ECML PKDD) Springer, 2011, pp. 12–27 DOI: 10.1007/978-3-642-23780-5_11
  4. Mayer Alvo and Philip L.H. Yu “Statistical Methods for Ranking Data” Springer, 2014 DOI: 10.1007/978-1-4939-1471-5
  5. “Concrete Problems in AI Safety”, 2016 arXiv:1606.06565
  6. “Direct Preference-based Policy Optimization without Reward Modeling”, 2023 URL: https://openreview.net/forum?id=FkAwlqBuyO
  7. “A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress” In Artificial Intelligence 297, 2021, pp. 103500 DOI: 10.1016/j.artint.2021.103500
  8. Christian Arzate Cruz and Takeo Igarashi “A Survey on Interactive Reinforcement Learning: Design Principles and Open Challenges” In Proceedings of the ACM Designing Interactive Systems Conference (DIS) Association for Computing Machinery, 2020, pp. 1195–1209 DOI: 10.1145/3357236.3395525
  9. “A General Language Assistant as a Laboratory for Alignment”, 2021 arXiv:2112.00861
  10. “Towards Psychology-Aware Preference Construction in Recommender Systems: Overview and Research Issues” In Journal of Intelligent Information Systems 57.3, 2021, pp. 467–489 DOI: 10.1007/s10844-021-00674-5
  11. “A General Theoretical Paradigm to Understand Learning from Human Preferences”, 2023 arXiv:2310.12036
  12. “Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback”, 2022 arXiv:2204.05862
  13. “Constitutional AI: Harmlessness from AI Feedback”, 2022 arXiv:2212.08073
  14. “Emergent Tool Use From Multi-Agent Autocurricula” In Proceedings of International Conference on Learning Representations (ICLR), 2020 URL: https://openreview.net/forum?id=SkxpxJBKwS
  15. “Fine-Tuning Language Models to Find Agreement among Humans with Diverse Preferences” In Advances in Neural Information Processing Systems (NeurIPS) 35, 2022, pp. 38176–38189 URL: https://proceedings.neurips.cc/paper/2022/hash/f978c8f3b5f399cae464e85f72e28503-Abstract-Conference.html
  16. “Active Reward Learning from Multiple Teachers”, 2023
  17. “Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI” In Information Fusion 58, 2020, pp. 82–115 DOI: 10.1016/j.inffus.2019.12.012
  18. “Active Learning of Reward Dynamics from Hierarchical Queries” In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019, pp. 120–127 DOI: 10.1109/IROS40897.2019.8968522
  19. Chandrayee Basu, Mukesh Singhal and Anca D. Dragan “Learning from Richer Human Guidance: Augmenting Comparison-Based Learning with Feature Queries” In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction (HRI) Association for Computing Machinery, 2018, pp. 132–140 DOI: 10.1145/3171221.3171284
  20. “Do You Want Your Autonomous Car To Drive Like You?” In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction (HRI) Association for Computing Machinery, 2017, pp. 417–425 DOI: 10.1145/2909824.3020250
  21. “Preference-Based Online Learning with Dueling Bandits: A Survey” In Journal of Machine Learning Research 22.7, 2021, pp. 1–108 URL: http://jmlr.org/papers/v22/18-546.html
  22. “Preselection Bandits” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2020, pp. 778–787 URL: https://proceedings.mlr.press/v119/bengs20a.html
  23. Viktor Bengs, Aadirupa Saha and Eyke Hüllermeier “Stochastic Contextual Dueling Bandits under Linear Stochastic Transitivity Models” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2022, pp. 1764–1786 URL: https://proceedings.mlr.press/v162/bengs22a.html
  24. Tom Bewley, Jonathan Lawry and Arthur Richards “Learning Interpretable Models of Aircraft Handling Behaviour by Reinforcement Learning from Human Feedback”, 2023 arXiv:2305.16924
  25. “Reward Learning with Trees: Methods and Evaluation”, 2022 arXiv:2210.01007
  26. “Interpretable Preference-based Reinforcement Learning with Tree-Structured Reward Functions” In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS) IFAAMAS, 2022, pp. 118–126 URL: https://www.ifaamas.org/Proceedings/aamas2022/pdfs/p118.pdf
  27. “A Conceptual Framework for Externally-Influenced Agents: An Assisted Reinforcement Learning Review” In Journal of Ambient Intelligence and Humanized Computing, 2021, pp. 3621–3644 DOI: 10.1007/s12652-021-03489-y
  28. “Active Preference-Based Gaussian Process Regression for Reward Learning” In Proceedings of Robotics: Science and Systems (RSS) 16, 2020 URL: http://www.roboticsproceedings.org/rss16/p041.html
  29. “Learning Reward Functions from Diverse Sources of Human Feedback: Optimally Integrating Demonstrations and Preferences” In The International Journal of Robotics Research 41.1 SAGE Publications Ltd STM, 2022, pp. 45–67 DOI: 10.1177/02783649211041652
  30. “Asking Easy Questions: A User-Friendly Approach to Active Reward Learning” In Proceedings of the Conference on Robot Learnin (CoRL) PMLR, 2020, pp. 1177–1190 URL: https://proceedings.mlr.press/v100/b-iy-ik20a.html
  31. “Batch Active Preference-Based Learning of Reward Functions” In Proceedings of the Conference on Robot Learning (CoRL) PMLR, 2018, pp. 519–528 URL: https://proceedings.mlr.press/v87/biyik18a.html
  32. Erdem Bıyık, Aditi Talati and Dorsa Sadigh “APReL: A Library for Active Preference-based Reward Learning Algorithms” In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction (HRI), 2022, pp. 613–617 DOI: 10.1109/HRI53351.2022.9889650
  33. “Quantifying Hypothesis Space Misspecification in Learning From Human–Robot Demonstrations and Physical Corrections” In IEEE Transactions on Robotics 36.3, 2020, pp. 835–854 DOI: 10.1109/TRO.2020.2971415
  34. “SIRL: Similarity-based Implicit Representation Learning” In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction (HRI) Association for Computing Machinery, 2023, pp. 565–574 DOI: 10.1145/3568162.3576989
  35. “LESS Is More: Rethinking Probabilistic Models of Human Behavior” In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction (HRI) Association for Computing Machinery, 2020, pp. 429–437 DOI: 10.1145/3319502.3374811
  36. “Inducing Structure in Reward Learning by Learning Features” In The International Journal of Robotics Research 41.5 SAGE Publications Ltd STM, 2022, pp. 497–518 DOI: 10.1177/02783649221078031
  37. “Settling the Reward Hypothesis”, 2022 arXiv:2212.10420
  38. Ralph Allan Bradley and Milton E. Terry “Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons” In Biometrika 39.3/4 [Oxford University Press, Biometrika Trust], 1952, pp. 324–345 DOI: 10.2307/2334029
  39. “Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2020, pp. 1165–1177 URL: https://proceedings.mlr.press/v119/brown20a.html
  40. “Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2019, pp. 783–792 URL: https://proceedings.mlr.press/v97/brown19a.html
  41. “Preference-Based Reinforcement Learning: Evolutionary Direct Policy Search Using a Preference-Based Racing Algorithm” In Machine Learning 97.3, 2014, pp. 327–351 DOI: 10.1007/s10994-014-5458-8
  42. “Scaling Data-Driven Robotics with Reward Sketching and Batch Reinforcement Learning” In Proceedings of Robotics: Science and Systems (RSS) 16, 2020 DOI: 10.15607/RSS.2020.XVI.076
  43. Haoyang Cao, Samuel Cohen and Lukasz Szpruch “Identifiability in Inverse Reinforcement Learning” In Advances in Neural Information Processing Systems (NeurIPS) 34 Curran Associates, Inc., 2021, pp. 12362–12373 URL: https://proceedings.neurips.cc/paper/2021/hash/671f0311e2754fcdd37f70a8550379bc-Abstract.html
  44. Zehong Cao, KaiChiu Wong and Chin-Teng Lin “Weak Human Preference Supervision for Deep Reinforcement Learning” In IEEE Transactions on Neural Networks and Learning Systems 32.12, 2021, pp. 5369–5378 DOI: 10.1109/TNNLS.2021.3084198
  45. “Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback”, 2023 arXiv:2307.15217
  46. “trlX: A Scalable Framework for RLHF”, 2023 Zenodo DOI: 10.5281/zenodo.8076391
  47. Manuela Cattelan “Models for Paired Comparison Data: A Review with Emphasis on Dependent Data” In Statistical Science 27.3 Institute of Mathematical Statistics, 2012, pp. 412–433 DOI: 10.1214/12-STS396
  48. Lawrence Chan, Andrew Critch and Anca Dragan “Human Irrationality: Both Bad and Good for Reward Inference”, 2021 arXiv:2111.06956
  49. “On the Theory of Reinforcement Learning with Once-per-Episode Feedback” In Advances in Neural Information Processing Systems (NeurIPS) 34 Curran Associates, Inc., 2021, pp. 3401–3412 URL: https://proceedings.neurips.cc/paper/2021/hash/1bf2efbbe0c49b9f567c2e40f645279a-Abstract.html
  50. Jiaao Chen, Mohan Dodda and Diyi Yang “Human-in-the-Loop Abstractive Dialogue Summarization” In Findings of the Association for Computational Linguistics (ACL) Association for Computational Linguistics, 2023, pp. 9176–9190 DOI: 10.18653/v1/2023.findings-acl.584
  51. “Human Decision Making and Recommender Systems” In ACM Transactions on Interactive Intelligent Systems 3.3, 2013, pp. 17:1–17:7 DOI: 10.1145/2533670.2533675
  52. “Human-in-the-Loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2022, pp. 3773–3793 URL: https://proceedings.mlr.press/v162/chen22ag.html
  53. “Preference-Based Policy Iteration: Leveraging Preference Learning for Reinforcement Learning” In Proceedings of Machine Learning and Knowledge Discovery in Databases (ECML PKDD) Springer, 2011, pp. 312–327 DOI: 10.1007/978-3-642-23780-5_30
  54. “Nonparametric Bayesian Inverse Reinforcement Learning for Multiple Reward Functions” In Advances in Neural Information Processing Systems (NIPS) 25 Curran Associates, Inc., 2012, pp. 314–322 URL: https://proceedings.neurips.cc/paper/2012/hash/140f6969d5213fd0ece03148e62e461e-Abstract.html
  55. Sayak Ray Chowdhury and Xingyu Zhou “Differentially Private Reward Estimation from Preference Based Feedback”, 2023 URL: https://openreview.net/forum?id=TqzYmBPSGC
  56. Paul Christiano “Semi-Supervised Reinforcement Learning”, 2016 Medium URL: https://ai-alignment.com/semi-supervised-reinforcement-learning-cf7d5375197f
  57. Paul Christiano “Thoughts on the Impact of RLHF Research”, 2023 URL: https://www.alignmentforum.org/posts/vwu4kegAEZTBtpT6p/thoughts-on-the-impact-of-rlhf-research
  58. “Deep Reinforcement Learning from Human Preferences” In Advances in Neural Information Processing Systems (NIPS) 30 Curran Associates, Inc., 2017, pp. 4299–4307 URL: https://proceedings.neurips.cc/paper/2017/hash/d5e2c0adad503c91f91df240d0cd4e49-Abstract.html
  59. “Faulty Reward Functions in the Wild”, 2016 OpenAI Blog URL: https://openai.com/blog/faulty-reward-functions/
  60. “Safety-Aware Preference-Based Learning for Safety-Critical Control” In Proceedings of the Annual Learning for Dynamics and Control Conference (L4DC) PMLR, 2022, pp. 1020–1033 URL: https://proceedings.mlr.press/v168/cosner22a.html
  61. “Active Reward Learning from Critiques” In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 6907–6914 DOI: 10.1109/ICRA.2018.8460854
  62. “The EMPATHIC Framework for Task Learning from Implicit Human Feedback” In Proceedings of the Conference on Robot Learning (CoRL) PMLR, 2021, pp. 604–626 URL: https://proceedings.mlr.press/v155/cui21a.html
  63. “Safe RLHF: Safe Reinforcement Learning from Human Feedback”, 2023 arXiv:2310.12773
  64. “Active Reward Learning” In Proceedings of Robotics: Science and Systems (RSS) 10, 2014 URL: http://www.roboticsproceedings.org/rss10/p31.html
  65. “The Expertise Problem: Learning from Specialized Feedback”, 2022 URL: https://openreview.net/forum?id=I7K975-H1Mg
  66. “Ordering Effects and Choice Set Awareness in Repeat-Response Stated Preference Studies” In Journal of Environmental Economics and Management 63.1, 2012, pp. 73–91 DOI: 10.1016/j.jeem.2011.09.001
  67. “Bridging the Gap between Regret Minimization and Best Arm Identification, with Application to A/B Tests” In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS) PMLR, 2019, pp. 1988–1996 URL: https://proceedings.mlr.press/v89/degenne19a.html
  68. “Learning a Universal Human Prior for Dexterous Manipulation from Human Preference”, 2023 arXiv:2304.04602
  69. “AlignDiff: Aligning Diverse Human Preferences via Behavior-Customisable Diffusion Model”, 2023 arXiv:2310.02054
  70. “Vision-Language Models as Success Detectors”, 2023 arXiv:2303.07280
  71. “Inverse Optimal Control with Linearly-Solvable MDPs” In Proceedings of the International Conference on Machine Learning (ICML) Omnipress, 2010, pp. 335–342 URL: https://icml.cc/Conferences/2010/papers/571.pdf
  72. Cynthia Dwork “Differential Privacy: A Survey of Results” In Proceedings of Theory and Applications of Models of Computation (TAMC) Springer, 2008, pp. 1–19 DOI: 10.1007/978-3-540-79228-4_1
  73. “Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance Learning” In Advances in Neural Information Processing Systems (NeurIPS) 35, 2022, pp. 27652–27663 URL: https://proceedings.neurips.cc/paper/2022/hash/b157cfde6794e93b2353b9712bbd45a5-Abstract-Conference.html
  74. “Actively Learning Costly Reward Functions for Reinforcement Learning”, 2022 URL: https://openreview.net/forum?id=eFHNEv6G9fF
  75. “INQUIRE: INteractive Querying for User-aware Informative REasoning” In Proceedings of the Conference on Robot Learning (CoRL) PMLR, 2023, pp. 2241–2250 URL: https://proceedings.mlr.press/v205/fitzgerald23a.html
  76. Floyd J. Fowler “Survey Research Methods” SAGE Publications, 2013
  77. Rachel Freedman, Rohin Shah and Anca Dragan “Choice Set Misspecification in Reward Inference” In Proceedings of the Workshop on Artificial Intelligence Safety 2020 Co-Located with the 29th International Joint Conference on Artificial Intelligence and the 17th Pacific Rim International Conference on Artificial Intelligence 2640 CEUR, 2021 URL: https://ceur-ws.org/Vol-2640/#paper_14
  78. “Active Teacher Selection for Reinforcement Learning from Human Feedback”, 2023 arXiv:2310.15288
  79. “DERAIL: Diagnostic Environments for Reward And Imitation Learning”, 2020
  80. Justin Fu, Katie Luo and Sergey Levine “Learning Robust Rewards with Adverserial Inverse Reinforcement Learning” In Proceedings of International Conference on Learning Representations (ICLR), 2018 URL: https://openreview.net/forum?id=rkHywl-A-
  81. “Variational Inverse Control with Events: A General Framework for Data-Driven Reward Definition” In Advances in Neural Information Processing Systems (NIPS) 31 Curran Associates, Inc., 2018, pp. 8547–8556 URL: https://proceedings.neurips.cc/paper/2018/hash/c9319967c038f9b923068dabdf60cfe3-Abstract.html
  82. Scott Fujimoto, Herke Hoof and David Meger “Addressing Function Approximation Error in Actor-Critic Methods” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2018, pp. 1587–1596 URL: https://proceedings.mlr.press/v80/fujimoto18a.html
  83. “Preference-Based Reinforcement Learning: A Formal Framework and a Policy Iteration Algorithm” In Machine Learning 89.1, 2012, pp. 123–156 DOI: 10.1007/s10994-012-5313-8
  84. R.Michael Furr “Psychometrics: An Introduction” SAGE Publications, 2021
  85. Iason Gabriel “Artificial Intelligence, Values, and Alignment” In Minds and Machines 30.3, 2020, pp. 411–437 DOI: 10.1007/s11023-020-09539-2
  86. Leo Gao, John Schulman and Jacob Hilton “Scaling Laws for Reward Model Overoptimization” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2023, pp. 10835–10866 URL: https://proceedings.mlr.press/v202/gao23h.html
  87. Yang Gao, Christian M. Meyer and Iryna Gurevych “APRIL: Interactively Learning to Summarise by Combining Active Preference Learning and Reinforcement Learning” In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP) Association for Computational Linguistics, 2018, pp. 4120–4130 DOI: 10.18653/v1/D18-1445
  88. Yang Gao, Christian M. Meyer and Iryna Gurevych “Preference-Based Interactive Multi-Document Summarisation” In Information Retrieval Journal 23.6, 2020, pp. 555–585 DOI: 10.1007/s10791-019-09367-8
  89. “DeepMDP: Learning Continuous Latent Space Models for Representation Learning” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2019, pp. 2170–2179 URL: https://proceedings.mlr.press/v97/gelada19a.html
  90. Hans-Otto Georgii “Gibbs Measures and Phase Transitions” Walter de Gruyter, 2011
  91. “The Effect of Modeling Human Rationality Level on Learning Rewards from Multiple Feedback Types” In Proceedings of the AAAI Conference on Artificial Intelligence 37.5, 2023, pp. 5983–5992 DOI: 10.1609/aaai.v37i5.25740
  92. “Reducing the Number of Queries in Interactive Value Iteration” In Proceedings of Algorithmic Decision Theory (ADT) Springer International Publishing, 2015, pp. 139–152 DOI: 10.1007/978-3-319-23114-3_9
  93. “Quantile Reinforcement Learning”, 2016
  94. Hugo Gilbert, Paul Weng and Yan Xu “Optimizing Quantiles in Preference-Based Markov Decision Processes” In Proceedings of the AAAI Conference on Artificial Intelligence 31.1, 2017, pp. 3569–3575 DOI: 10.1609/aaai.v31i1.11026
  95. “Model-Free Reinforcement Learning with Skew-Symmetric Bilinear Utilities” In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), 2016, pp. 252–261 URL: http://auai.org/uai2016/proceedings/papers/91.pdf
  96. “Improving Alignment of Dialogue Agents via Targeted Human Judgements”, 2022 arXiv:2209.14375
  97. “A Survey on Interpretable Reinforcement Learning”, 2022 arXiv:2112.13112
  98. “Quantifying Differences in Reward Functions” In Proceedings of International Conference on Learning Representations (ICLR), 2022 URL: https://iclr.cc/virtual/2021/poster/3348
  99. “Uncertainty Estimation for Language Reward Models”, 2022 arXiv:2203.07472
  100. “Imitation: Clean Imitation Learning Implementations”, 2022 arXiv:2211.11972
  101. “Widening the Pipeline in Human-Guided Reinforcement Learning with Explanation and Context-Aware Data Augmentation” In Advances in Neural Information Processing Systems (NeurIPS), 2021, pp. 21885–21897 URL: https://proceedings.neurips.cc/paper/2021/hash/b6f8dc086b2d60c5856e4ff517060392-Abstract.html
  102. Faruk Gul, Paulo Natenzon and Wolfgang Pesendorfer “Random Choice as Behavioral Optimization” In Econometrica 82.5, 2014, pp. 1873–1912 DOI: 10.3982/ECTA10621
  103. “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2018, pp. 1861–1870 URL: https://proceedings.mlr.press/v80/haarnoja18b.html
  104. Soheil Habibian, Ananth Jonnavittula and Dylan P. Losey “Here’s What I’ve Learned: Asking Questions That Reveal Reward Learning” In ACM Transactions on Human-Robot Interaction 11.4, 2022, pp. 40:1–40:28 DOI: 10.1145/3526107
  105. “Testification of Condorcet Winners in Dueling Bandits” In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) PMLR, 2021, pp. 1195–1205 URL: https://proceedings.mlr.press/v161/haddenhorst21a.html
  106. Björn Haddenhorst, Eyke Hüllermeier and Martin Kolb “Generalized Transitivity: A Systematic Comparison of Concepts with an Application to Preferences in the Babington Smith Model” In International Journal of Approximate Reasoning 119, 2020, pp. 373–407 DOI: 10.1016/j.ijar.2020.01.007
  107. “The Off-Switch Game” In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) International Joint Conferences on Artificial Intelligence Organization, 2017, pp. 220–227 DOI: 10.24963/ijcai.2017/32
  108. “Inverse Reward Design” In Advances in Neural Information Processing Systems (NIPS) 30 Curran Associates, Inc., 2017, pp. 6765–6774 URL: https://proceedings.neurips.cc/paper/2017/hash/32fdab6559cdfa4f167f8c31b9199643-Abstract.html
  109. “Dream to Control: Learning Behaviors by Latent Imagination” In Proceedings of International Conference on Learning Representations (ICLR), 2020 URL: https://openreview.net/forum?id=S1lOTC4tDS
  110. “Mastering Diverse Domains through World Models”, 2023 arXiv:2301.04104
  111. “Methodological Reflections for AI Alignment Research Using Human Feedback”, 2022 arXiv:2301.06859
  112. Jerry Zhi-Yang He and Anca D. Dragan “Assisted Robust Reward Design” In Proceedings of the Conference on Robot Learning (CoRL) PMLR, 2022, pp. 1234–1246 URL: https://proceedings.mlr.press/v164/he22a.html
  113. Donald Joseph Hejna and Dorsa Sadigh “Few-Shot Preference Learning for Human-in-the-Loop RL” In Proceedings of the Conference on Robot Learning (CoRL) PMLR, 2023, pp. 2014–2025 URL: https://proceedings.mlr.press/v205/iii23a.html
  114. “Inverse Preference Learning: Preference-based RL without a Reward Function”, 2023 URL: https://openreview.net/forum?id=ut9y3udeAo
  115. “Rainbow: Combining Improvements in Deep Reinforcement Learning” In Proceedings of the AAAI Conference on Artificial Intelligence 32.1, 2018, pp. 3215–3222 DOI: 10.1609/aaai.v32i1.11796
  116. Matthew Hoffman, Eric Brochu and Nando Freitas “Portfolio Allocation for Bayesian Optimization” In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) AUAI Press, 2011, pp. 327–336 DOI: 10.5555/3020548.3020587
  117. “Active Comparison Based Learning Incorporating User Uncertainty and Noise”, 2016
  118. “GAN-Based Interactive Reinforcement Learning from Demonstration and Human Evaluative Feedback” In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 4991–4998 DOI: 10.1109/ICRA48891.2023.10160939
  119. “Reward Learning from Human Preferences and Demonstrations in Atari” In Advances in Neural Information Processing Systems (NIPS) 31 Curran Associates, Inc., 2018, pp. 8022–8034 URL: https://proceedings.neurips.cc/paper/2018/hash/8cbe9ce23f42628c98f80fa0fac8b19a-Abstract.html
  120. “Off-Policy Evaluation via Off-Policy Classification” In Advances in Neural Information Processing Systems (NeurIPS) 32 Curran Associates, Inc., 2019, pp. 5438–5449 URL: https://proceedings.neurips.cc/paper/2019/hash/b5b03f06271f8917685d14cea7c6c50a-Abstract.html
  121. “A Social Reinforcement Learning Agent” In Proceedings of the International Conference on Autonomous Agents (AGENTS) Association for Computing Machinery, 2001, pp. 377–384 DOI: 10.1145/375735.376334
  122. “Learning Preferences for Manipulation Tasks from Online Coactive Feedback” In The International Journal of Robotics Research 34.10 SAGE Publications Ltd STM, 2015, pp. 1296–1313 DOI: 10.1177/0278364915581193
  123. “Preprocessing Reward Functions for Interpretability”, 2021
  124. Erik Jenner, Joar Max Viktor Skalse and Adam Gleave “A General Framework for Reward Function Distances”, 2022 URL: https://openreview.net/forum?id=Hn21kZHiCK
  125. Hong Jun Jeon, Smitha Milli and Anca Dragan “Reward-Rational (Implicit) Choice: A Unifying Formalism for Reward Learning” In Advances in Neural Information Processing Systems (NeurIPS) 33 Curran Associates, Inc., 2020, pp. 4415–4426 URL: https://proceedings.neurips.cc/paper/2020/hash/2f10c1578a0706e06b6d7db6f0b4a6af-Abstract.html
  126. “AI Alignment: A Comprehensive Survey”, 2023 arXiv:2310.19852
  127. “Provable Benefits of Policy Learning from Human Preferences in Contextual Bandit Problems”, 2023 arXiv:2307.12975
  128. “Doubly Robust Off-policy Value Evaluation for Reinforcement Learning” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2016, pp. 652–661 URL: https://proceedings.mlr.press/v48/jiang16.html
  129. “Provably Efficient Reinforcement Learning with Linear Function Approximation” In Proceedings of the Conference on Learning Theory (COLT) PMLR, 2020, pp. 2137–2143 URL: https://proceedings.mlr.press/v125/jin20a.html
  130. Ying Jin, Zhuoran Yang and Zhaoran Wang “Is Pessimism Provably Efficient for Offline RL?” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2021, pp. 5084–5096 URL: https://proceedings.mlr.press/v139/jin21e.html
  131. Gregory Kahn, Pieter Abbeel and Sergey Levine “LaND: Learning to Navigate From Disengagements” In IEEE Robotics and Automation Letters 6.2, 2021, pp. 1872–1879 DOI: 10.1109/LRA.2021.3060404
  132. Akansha Kalra and Daniel S. Brown “Interpretable Reward Learning via Differentiable Decision Trees”, 2022 URL: https://openreview.net/forum?id=3bk40MsYjet
  133. Akansha Kalra and Daniel S. Brown “Can Differentiable Decision Trees Learn Interpretable Reward Functions?”, 2023 arXiv:2306.13004
  134. “Beyond Reward: Offline Preference-guided Policy Optimization” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2023, pp. 15753–15768 URL: https://proceedings.mlr.press/v202/kang23b.html
  135. “Preference-Based Learning of Reward Function Features”, 2021 arXiv:2103.02727
  136. “On the Challenges and Practices of Reinforcement Learning from Real Human Feedback”, 2023
  137. Hadi Kazemi, Fariborz Taherkhani and Nasser M. Nasrabadi “Preference-Based Image Generation” In Proceedings of IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2020, pp. 3393–3402 DOI: 10.1109/WACV45572.2020.9093406
  138. “Preference Transformer: Modeling Human Preferences Using Transformers for RL” In Proceedings of International Conference on Learning Representations (ICLR), 2023 URL: https://openreview.net/forum?id=Peot1SFDX0
  139. “Reward Identification in Inverse Reinforcement Learning” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2021, pp. 5496–5505 URL: https://proceedings.mlr.press/v139/kim21c.html
  140. “Researching Alignment Research: Unsupervised Analysis”, 2022 arXiv:2206.02841
  141. W.Bradley Knox “Learning from Human-generated Reward”, 2012 URL: https://repositories.lib.utexas.edu/items/20b9e8a1-a78d-4844-816f-3c0b0a4c848a
  142. “Reward (Mis)Design for Autonomous Driving” In Artificial Intelligence 316, 2023, pp. 103829 DOI: 10.1016/j.artint.2022.103829
  143. “Models of Human Preference for Learning Reward Functions”, 2022 arXiv:2206.02231
  144. “TAMER: Training an Agent Manually via Evaluative Reinforcement” In Proceedings of the IEEE International Conference on Development and Learning (ICDL), 2008, pp. 292–297 DOI: 10.1109/DEVLRN.2008.4640845
  145. “Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning” In Advances in Neural Information Processing Systems (NeurIPS) 35, 2022, pp. 11063–11078 URL: https://proceedings.neurips.cc/paper/2022/hash/476c289f685e27936aa089e9d53a4213-Abstract-Conference.html
  146. Pallavi Koppol, Henny Admoni and Reid Simmons “Iterative Interactive Reward Learning”, 2020
  147. Anna Korba, Stéphan Clemencon and Eric Sibony “A Learning Theory of Ranking Aggregation” In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS) PMLR, 2017, pp. 1001–1010 URL: https://proceedings.mlr.press/v54/korba17a.html
  148. Samantha Krening and Karen M. Feigh “Interaction Algorithm Effect on Human Experience with Reinforcement Learning” In ACM Transactions on Human-Robot Interaction 7.2, 2018, pp. 16:1–16:22 DOI: 10.1145/3277904
  149. Andras Kupcsik, David Hsu and Wee Sun Lee “Learning Dynamic Robot-to-Human Object Handover from Human Feedback” In Robotics Research: Volume 1, Springer Proceedings in Advanced Robotics Springer International Publishing, 2018, pp. 161–176 DOI: 10.1007/978-3-319-51532-8_10
  150. Nathan Lambert “Reward Is Not Enough”, 2021 Democratizing Automation URL: https://robotic.substack.com/p/reward-is-not-enough
  151. “Reinforcement Learning with Augmented Data” In Advances in Neural Information Processing Systems (NeurIPS) 33 Curran Associates, Inc., 2020, pp. 19884–19895 URL: https://proceedings.neurips.cc/paper/2020/hash/e615c82aba461681ade82da2da38004a-Abstract.html
  152. “Bandit Algorithms” Cambridge University Press, 2020 DOI: 10.1017/9781108571401
  153. Hoang Le, Cameron Voloshin and Yisong Yue “Batch Policy Learning under Constraints” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2019, pp. 3703–3712 URL: https://proceedings.mlr.press/v97/le19a.html
  154. “Network Randomization: A Simple Technique for Generalization in Deep Reinforcement Learning” In Proceedings of International Conference on Learning Representations (ICLR), 2020 URL: https://openreview.net/forum?id=HJgcvJBFvB
  155. “Aligning Text-to-Image Models Using Human Feedback”, 2023 arXiv:2302.12192
  156. “B-Pref: Benchmarking Preference-Based Reinforcement Learning”, 2021 URL: https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/d82c8d1619ad8176d665453cfb2e55f0-Abstract-round1.html
  157. Kimin Lee, Laura M. Smith and Pieter Abbeel “PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2021, pp. 6152–6163 URL: https://proceedings.mlr.press/v139/lee21i.html
  158. “The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities” In Artificial Life 26.2, 2020, pp. 274–306 DOI: 10.1162/artl_a_00319
  159. “Scalable Agent Alignment via Reward Modeling: A Research Direction”, 2018 arXiv:1811.07871
  160. “Learning Human Objectives from Sequences of Physical Corrections” In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 2877–2883 DOI: 10.1109/ICRA48506.2021.9560829
  161. Zihao Li, Zhuoran Yang and Mengdi Wang “Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism”, 2023 URL: https://openreview.net/forum?id=gxM2AUFMsK
  162. “Reward Uncertainty for Exploration in Preference-based Reinforcement Learning” In Proceedings of International Conference on Learning Representations (ICLR), 2022 URL: https://openreview.net/forum?id=OWZVD-l-ZrC
  163. “The Construction of Preference” Cambridge University Press, 2006 DOI: 10.1017/CBO9780511618031
  164. “A Review on Interactive Reinforcement Learning From Human Social Feedback” In IEEE Access 8, 2020, pp. 120757–120765 DOI: 10.1109/ACCESS.2020.3006254
  165. Stephanie Lin, Jacob Hilton and Owain Evans “TruthfulQA: Measuring How Models Mimic Human Falsehoods” In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (ACL) Association for Computational Linguistics, 2022, pp. 3214–3252 DOI: 10.18653/v1/2022.acl-long.229
  166. “Invariant Transform Experience Replay: Data Augmentation for Deep Reinforcement Learning” In IEEE Robotics and Automation Letters 5.4, 2020, pp. 6615–6622 DOI: 10.1109/LRA.2020.3013937
  167. “Humans Are Not Boltzmann Distributions: Challenges and Opportunities for Modelling Human Feedback and Interaction in Reinforcement Learning”, 2022
  168. “Information Directed Reward Learning for Reinforcement Learning” In Advances in Neural Information Processing Systems (NeurIPS) 34 Curran Associates, Inc., 2021, pp. 3850–3862 URL: https://proceedings.neurips.cc/paper/2021/hash/1fa6269f58898f0e809575c9a48747ef-Abstract.html
  169. “Zero-Shot Preference Learning for Offline RL via Optimal Transport”, 2023 arXiv:2306.03615
  170. “Summary of ChatGPT-Related Research and Perspective towards the Future of Large Language Models” In Meta-Radiology 1.2, 2023, pp. 100017 DOI: 10.1016/j.metrad.2023.100017
  171. “Physical Interaction as Communication: Learning Robot Objectives Online from Human Corrections” In The International Journal of Robotics Research 41.1 SAGE Publications Ltd STM, 2022, pp. 20–44 DOI: 10.1177/02783649211050958
  172. Dylan P. Losey and Marcia K. O’Malley “Including Uncertainty When Learning from Human Corrections” In Proceedings of the Conference on Robot Learning (CoRL) PMLR, 2018, pp. 123–132 URL: http://proceedings.mlr.press/v87/losey18a.html
  173. R.Duncan Luce “Individual Choice Behavior”, Individual Choice Behavior John Wiley, 1959, pp. xii\bibrangessep153
  174. “A Survey of Reinforcement Learning Informed by Natural Language” In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) International Joint Conferences on Artificial Intelligence Organization, 2019, pp. 6309–6317 DOI: 10.24963/ijcai.2019/880
  175. “Eureka: Human-Level Reward Design via Coding Large Language Models”, 2023 arXiv:2310.12931
  176. “Consumer Decision Making in Knowledge-Based Recommendation” In Journal of Intelligent Information Systems 37.1, 2011, pp. 1–22 DOI: 10.1007/s10844-010-0134-3
  177. “Risk Bounds for Statistical Learning” In The Annals of Statistics 34.5 Institute of Mathematical Statistics, 2006, pp. 2326–2366 DOI: 10.1214/009053606000000786
  178. “On The Fragility of Learned Reward Functions”, 2022 URL: https://openreview.net/forum?id=9gj9vXfeS-y
  179. Jorge Mendez, Shashank Shivkumar and Eric Eaton “Lifelong Inverse Reinforcement Learning” In Advances in Neural Information Processing Systems (NIPS) 31 Curran Associates, Inc., 2018, pp. 4507–4518 URL: https://papers.nips.cc/paper/2018/hash/2d969e2cee8cfa07ce7ca0bb13c7a36d-Abstract.html
  180. Katherine Metcalf, Miguel Sarabia and Barry-John Theobald “Rewards Encoding Environment Dynamics Improves Preference-based Reinforcement Learning”, 2022 arXiv:2211.06527
  181. “RLHF-Blender: A Configurable Interactive Interface for Learning from Diverse Human Feedback”, 2023 URL: https://openreview.net/forum?id=JvkZtzJBFQ
  182. “Towards Solving Fuzzy Tasks with Human Feedback: A Retrospective of the MineRL BASALT 2022 Competition” In Proceedings of the NeurIPS 2022 Competitions Track PMLR, 2022, pp. 171–188 URL: https://proceedings.mlr.press/v220/milani22a.html
  183. “Explainable Reinforcement Learning: A Survey and Comparative Review” In ACM Computing Surveys, 2023 DOI: 10.1145/3616864
  184. Smitha Milli and Anca D. Dragan “Literal or Pedagogic Human? Analyzing Human Model Misspecification in Objective Learning” In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) PMLR, 2020, pp. 925–934 URL: https://proceedings.mlr.press/v115/milli20a.html
  185. “Active Inverse Reward Design”, 2018
  186. “Human-Level Control through Deep Reinforcement Learning” In Nature 518.7540 Nature Publishing Group, 2015, pp. 529–533 DOI: 10.1038/nature14236
  187. “On Huber’s Contaminated Model” In Journal of Complexity 77, 2023, pp. 101745 DOI: 10.1016/j.jco.2023.101745
  188. “Learning Multimodal Rewards from Rankings” In Proceedings of the Conference on Robot Learning (CoRL) PMLR, 2022, pp. 342–352 URL: https://proceedings.mlr.press/v164/myers22a.html
  189. Vivek Myers, Erdem Bıyık and Dorsa Sadigh “Active Reward Learning from Online Preferences” In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 7511–7518 DOI: 10.1109/ICRA48891.2023.10160439
  190. “Reinforcement Learning With Human Advice: A Survey” In Frontiers in Robotics and AI 8, 2021, pp. 584075 DOI: 10.3389/frobt.2021.584075
  191. “WebGPT: Browser-assisted Question-Answering with Human Feedback”, 2022 arXiv:2112.09332
  192. “Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey” In Journal of Machine Learning Research 21.181, 2020, pp. 1–50 URL: http://jmlr.org/papers/v21/20-212.html
  193. “Training Parsers by Inverse Reinforcement Learning” In Machine Learning 77.2, 2009, pp. 303–337 DOI: 10.1007/s10994-009-5110-1
  194. Andrew Y. Ng, Daishi Harada and Stuart J. Russell “Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping” In Proceedings of the International Conference on Machine Learning (ICML) Morgan Kaufmann Publishers Inc., 1999, pp. 278–287
  195. Andrew Y. Ng and Stuart J. Russell “Algorithms for Inverse Reinforcement Learning” In Proceedings of the International Conference on Machine Learning (ICML) Morgan Kaufmann Publishers Inc., 2000, pp. 663–670
  196. “Dueling Posterior Sampling for Preference-Based Reinforcement Learning” In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) PMLR, 2020, pp. 1029–1038 URL: https://proceedings.mlr.press/v124/novoseller20a.html
  197. OpenAI “ChatGPT: Optimizing Language Models for Dialogue”, 2022 URL: https://openai.com/blog/chatgpt
  198. OpenAI “GPT-4 Technical Report”, 2023 URL: https://cdn.openai.com/papers/gpt-4.pdf
  199. “An Algorithmic Perspective on Imitation Learning” In Foundations and Trends® in Robotics 7.1-2 Now Publishers, Inc., 2018, pp. 1–179 DOI: 10.1561/2300000053
  200. “Training Language Models to Follow Instructions with Human Feedback” In Advances in Neural Information Processing Systems (NeurIPS) 35, 2022, pp. 27730–27744 URL: https://proceedings.neurips.cc/paper/2022/hash/b1efde53be364a73914f58805a001731-Abstract-Conference.html
  201. Cosmin Paduraru “Off-Policy Evaluation in Markov Decision Processes”, 2013 URL: https://escholarship.mcgill.ca/concern/theses/p8418r74h
  202. “Learning Reward Functions by Integrating Human Demonstrations and Preferences” In Proceedings of Robotics: Science and Systems (RSS) 15, 2019 URL: http://www.roboticsproceedings.org/rss15/p23.html
  203. “SURF: Semi-supervised Reward Learning with Data Augmentation for Feedback-efficient Preference-based Reinforcement Learning” In Proceedings of International Conference on Learning Representations (ICLR), 2022 URL: https://openreview.net/forum?id=TfhfZLQ2EJO
  204. “Tuning Computer Vision Models With Task Rewards” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2023, pp. 33229–33239 URL: https://proceedings.mlr.press/v202/susano-pinto23a.html
  205. R.L. Plackett “The Analysis of Permutations” In Journal of the Royal Statistical Society. Series C (Applied Statistics) 24.2 [Wiley, Royal Statistical Society], 1975, pp. 193–202 DOI: 10.2307/2346567
  206. “Designing Interfaces for Explicit Preference Elicitation: A User-Centered Investigation of Preference Representation and Elicitation Process” In User Modeling and User-Adapted Interaction 22.4, 2012, pp. 357–397 DOI: 10.1007/s11257-011-9116-6
  207. “Towards Intrinsic Interactive Reinforcement Learning”, 2022 arXiv:2112.01575
  208. Doina Precup, Richard S. Sutton and Satinder P. Singh “Eligibility Traces for Off-Policy Policy Evaluation” In Proceedings of the International Conference on Machine Learning (ICML) Morgan Kaufmann Publishers Inc., 2000, pp. 759–766
  209. Erika Puiutta and Eric M.S.P. Veith “Explainable Reinforcement Learning: A Survey” In Proceedings of Machine Learning and Knowledge Extraction (CD-MAKE) Springer International Publishing, 2020, pp. 77–95 DOI: 10.1007/978-3-030-57321-8_5
  210. Junqi Qian, Paul Weng and Chenmien Tan “Learning Rewards to Optimize Global Performance Metrics in Deep Reinforcement Learning” In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS) IFAAMAS, 2023, pp. 1951–1960 DOI: 10.5555/3545946.3598864
  211. “A Survey on Explainable Reinforcement Learning: Concepts, Algorithms, Challenges”, 2023 arXiv:2211.06665
  212. Mattia Racca, Antti Oulasvirta and Ville Kyrki “Teacher-Aware Active Robot Learning” In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction (HRI), 2019, pp. 335–343 DOI: 10.1109/HRI.2019.8673300
  213. “Direct Preference Optimization: Your Language Model Is Secretly a Reward Model”, 2023 URL: https://openreview.net/forum?id=HPuSIXJaa9
  214. “Safe Deep RL in 3D Environments Using Human Feedback”, 2022 arXiv:2201.08102
  215. “Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization” In Proceedings of International Conference on Learning Representations (ICLR), 2023 URL: https://openreview.net/forum?id=8aHzds2uUyB
  216. “Learning Human Objectives by Evaluating Hypothetical Behavior” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2020, pp. 8020–8029 URL: https://proceedings.mlr.press/v119/reddy20a.html
  217. “Regret-Based Reward Elicitation for Markov Decision Processes” In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) AUAI Press, 2009, pp. 444–451 URL: https://dl.acm.org/doi/10.5555/1795114.1795166
  218. “Robust Online Optimization of Reward-Uncertain MDPs” In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) AAAI Press, 2011, pp. 2165–2171 DOI: 10.5591/978-1-57735-516-8/IJCAI11-361
  219. “Efficient Meta Reinforcement Learning for Preference-based Fast Adaptation” In Advances in Neural Information Processing Systems (NeurIPS) 35, 2022, pp. 15502–15515 URL: https://papers.nips.cc/paper_files/paper/2022/hash/63b2b056f48653b7cff0d8d233c96a4d-Abstract-Conference.html
  220. Daniel Russo and Benjamin Van Roy “Eluder Dimension and the Sample Complexity of Optimistic Exploration” In Advances in Neural Information Processing Systems (NIPS) 26 Curran Associates, Inc., 2013, pp. 2256–2264 URL: https://papers.nips.cc/paper/2013/hash/41bfd20a38bb1b0bec75acf0845530a7-Abstract.html
  221. John Rust “Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold Zurcher” In Econometrica 55.5 [Wiley, Econometric Society], 1987, pp. 999–1033 DOI: 10.2307/1911259
  222. “Using Discrete Choice Experiments to Value Health and Health Care” 11, The Economics of Non-Market Goods and Resources Springer Netherlands, 2008 DOI: 10.1007/978-1-4020-5753-3
  223. “Active Preference-Based Learning of Reward Functions” In Proceedings of Robotics: Science and Systems (RSS) 13, 2017 URL: http://www.roboticsproceedings.org/rss13/p53.html
  224. Aadirupa Saha “Optimal Algorithms for Stochastic Contextual Preference Bandits” In Advances in Neural Information Processing Systems (NeurIPS) 34 Curran Associates, Inc., 2021, pp. 30050–30062 URL: https://proceedings.neurips.cc/paper/2021/hash/fc3cf452d3da8402bebb765225ce8c0e-Abstract.html
  225. Aadirupa Saha, Aldo Pacchiano and Jonathan Lee “Dueling RL: Reinforcement Learning with Trajectory Preferences” In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS) PMLR, 2023, pp. 6263–6289 URL: https://proceedings.mlr.press/v206/saha23a.html
  226. “Trial without Error: Towards Safe Reinforcement Learning via Human Intervention” In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS) IFAAMAS, 2018, pp. 2067–2069 DOI: 10.5555/3237383.3238074
  227. “Programming by Feedback” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2014, pp. 1503–1511 URL: https://proceedings.mlr.press/v32/schoenauer14.html
  228. “Proximal Policy Optimization Algorithms”, 2017 arXiv:1707.06347
  229. “Data-Efficient Reinforcement Learning with Self-Predictive Representations” In Proceedings of International Conference on Learning Representations (ICLR), 2022 URL: https://openreview.net/forum?id=uCQfPZwRaUu
  230. Burr Settles “Active Learning” Morgan & Claypool Publishers, 2012
  231. “The MineRL BASALT Competition on Learning from Human Feedback”, 2021 arXiv:2107.01969
  232. Daniel Shin, Anca Dragan and Daniel S. Brown “Benchmarks and Algorithms for Offline Preference-Based Reward Learning” In Transactions on Machine Learning Research, 2023 URL: https://openreview.net/forum?id=TGuXXlbKsn
  233. “Online Structured Prediction via Coactive Learning” In Proceedings of the International Conference on Machine Learning (ICML) Omnipress, 2012, pp. 59–66 URL: http://icml.cc/2012/papers/717.pdf
  234. “Coactive Learning” In Journal of Artificial Intelligence Research 53, 2015, pp. 1–40 DOI: 10.1613/jair.4539
  235. Umer Siddique, Abhinav Sinha and Yongcan Cao “Fairness in Preference-based Reinforcement Learning”, 2023 URL: https://openreview.net/forum?id=ColATVnkEl
  236. “Reward Is Enough” In Artificial Intelligence 299, 2021, pp. 103535 DOI: 10.1016/j.artint.2021.103535
  237. “End-To-End Robotic Reinforcement Learning without Reward Engineering” In Proceedings of Robotics: Science and Systems (RSS) 15, 2019 URL: http://www.roboticsproceedings.org/rss15/p73.html
  238. Joar Max Viktor Skalse and Alessandro Abate “The Reward Hypothesis Is False”, 2022 URL: https://openreview.net/forum?id=5l1NgpzAfH
  239. Joar Max Viktor Skalse and Alessandro Abate “Misspecification in Inverse Reinforcement Learning” In Proceedings of the AAAI Conference on Artificial Intelligence 37.12, 2023, pp. 15136–15143 DOI: 10.1609/aaai.v37i12.26766
  240. “STARC: A General Framework For Quantifying Differences Between Reward Functions”, 2023 arXiv:2309.15257
  241. “Invariance in Policy Optimisation and Partial Identifiability in Reward Learning” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2023, pp. 32033–32058 URL: https://proceedings.mlr.press/v202/skalse23a.html
  242. “Defining and Characterizing Reward Gaming” In Advances in Neural Information Processing Systems (NeurIPS) 35, 2022, pp. 9460–9471 URL: https://proceedings.neurips.cc/paper_files/paper/2022/hash/3d719fee332caa23d5038b8a90e81796-Abstract-Conference.html
  243. “Reward Collapse in Aligning Large Language Models: A Prompt-Aware Approach to Preference Rankings”, 2023 URL: https://openreview.net/forum?id=dpWxK6aqIK
  244. “Learning to Summarize with Human Feedback” In Advances in Neural Information Processing Systems (NeurIPS) 33 Curran Associates, Inc., 2020, pp. 3008–3021 URL: https://proceedings.neurips.cc/paper/2020/hash/1f89885d556929e98d3ef9b86448f951-Abstract.html
  245. Hiroaki Sugiyama, Toyomi Meguro and Yasuhiro Minami “Preference-Learning Based Inverse Reinforcement Learning for Dialog Control” In Proceedings of Interspeech ISCA, 2012, pp. 222–225 DOI: 10.21437/Interspeech.2012-72
  246. “Multi-Dueling Bandits with Dependent Arms” In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) AUAI Press, 2017 URL: http://auai.org/uai2017/proceedings/papers/155.pdf
  247. “Advancements in Dueling Bandits” In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) International Joint Conferences on Artificial Intelligence Organization, 2018, pp. 5502–5510 DOI: 10.24963/ijcai.2018/776
  248. Richard S. Sutton and Andrew G. Barto “Reinforcement Learning: An Introduction”, Adaptive Computation and Machine Learning Series The MIT Press, 2018
  249. Louis Leon Thurstone “A Law of Comparative Judgment” In Psychological Review 34 Psychological Review Company, 1927, pp. 273–286 DOI: 10.1037/h0070288
  250. “Causal Confusion and Reward Misidentification in Preference-Based Reward Learning” In Proceedings of International Conference on Learning Representations (ICLR), 2023 URL: https://openreview.net/forum?id=R0Xxvr_X3ZA
  251. Kenneth E. Train “Discrete Choice Methods with Simulation” Cambridge University Press, 2009 DOI: 10.1017/CBO9780511805271
  252. Thi Ngoc Trang Tran, Alexander Felfernig and Nava Tintarev “Humanized Recommender Systems: State-of-the-art and Research Issues” In ACM Transactions on Interactive Intelligent Systems 11.2, 2021, pp. 9:1–9:41 DOI: 10.1145/3446906
  253. “POLAR: Preference Optimization and Learning Algorithms for Robotics”, 2022 arXiv:2208.04404
  254. L.G. Valiant “A Theory of the Learnable” In Communications of the ACM 27.11, 1984, pp. 1134–1142 DOI: 10.1145/1968.1972
  255. “Scalar Reward Is Not Enough: A Response to Silver, Singh, Precup and Sutton (2021)” In Autonomous Agents and Multi-Agent Systems 36.2, 2022, pp. 41 DOI: 10.1007/s10458-022-09575-5
  256. “Attention Is All You Need” In Advances in Neural Information Processing Systems (NIPS) 30 Curran Associates, Inc., 2017, pp. 5998–6008 URL: https://papers.nips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
  257. Mudit Verma, Siddhant Bhambri and Subbarao Kambhampati “Exploiting Unlabeled Data for Feedback Efficient Human Preference Based Reinforcement Learning”, 2023
  258. “A State Augmentation Based Approach to Reinforcement Learning from Human Preferences”, 2023
  259. “Data Driven Reward Initialization for Preference Based Reinforcement Learning”, 2023
  260. “Symbol Guided Hindsight Priors for Reward Learning from Human Preferences”, 2022 arXiv:2210.09151
  261. “A Comprehensive Survey on Deep Active Learning and Its Applications in Medical Image Analysis”, 2023 arXiv:2310.14230
  262. Yuanhao Wang, Qinghua Liu and Chi Jin “Is RLHF More Difficult than Standard RL? A Theoretical Perspective”, 2023 URL: https://openreview.net/forum?id=sxZLrBqg50
  263. “Maximizing BCI Human Feedback Using Active Learning” In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 10945–10951 DOI: 10.1109/IROS45743.2020.9341669
  264. “Interactive Value Iteration for Markov Decision Processes with Unknown Rewards” In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) AAAI Press, 2013, pp. 2415–2421 URL: https://www.ijcai.org/Proceedings/13/Papers/355.pdf
  265. “Do We Use the Right Measure? Challenges in Evaluating Reward Learning Algorithms” In Proceedings of the Conference on Robot Learning (CoRL) PMLR, 2023, pp. 1553–1562 URL: https://proceedings.mlr.press/v205/wilde23a.html
  266. “Learning Reward Functions from Scale Feedback” In Proceedings of the Conference on Robot Learning (CoRL) PMLR, 2022, pp. 353–362 URL: https://proceedings.mlr.press/v164/wilde22a.html
  267. Nils Wilde, Dana Kulić and Stephen L. Smith “Learning User Preferences in Robot Motion Planning Through Interaction” In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 619–626 DOI: 10.1109/ICRA.2018.8460586
  268. Nils Wilde, Dana Kulić and Stephen L. Smith “Active Preference Learning Using Maximum Regret” In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 10952–10959 DOI: 10.1109/IROS45743.2020.9341530
  269. Aaron Wilson, Alan Fern and Prasad Tadepalli “A Bayesian Approach for Policy Learning from Trajectory Preference Queries” In Advances in Neural Information Processing Systems (NIPS) 25 Curran Associates, Inc., 2012, pp. 1142–1150 URL: https://proceedings.neurips.cc/paper/2012/hash/16c222aa19898e5058938167c8ab6c57-Abstract.html
  270. “A Survey of Preference-Based Reinforcement Learning Methods” In Journal of Machine Learning Research 18.136, 2017, pp. 1–46 URL: http://jmlr.org/papers/v18/16-634.html
  271. “A Policy Iteration Algorithm for Learning from Preference-Based Feedback” In Advances in Intelligent Data Analysis (IDA) Springer, 2013, pp. 427–437 DOI: 10.1007/978-3-642-41398-8_37
  272. “EPMC: Every Visit Preference Monte Carlo for Reinforcement Learning” In Proceedings of the Asian Conference on Machine Learning (ACML) PMLR, 2013, pp. 483–497 URL: https://proceedings.mlr.press/v29/Wirth13.html
  273. Christian Wirth, Johannes Fürnkranz and Gerhard Neumann “Model-Free Preference-Based Reinforcement Learning” In Proceedings of the AAAI Conference on Artificial Intelligence AAAI Press, 2016, pp. 2222–2228 URL: http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/12247
  274. “Making RL with Preference-based Feedback Efficient via Randomization”, 2023 arXiv:2310.14554
  275. “A Survey of Human-in-the-Loop for Machine Learning” In Future Generation Computer Systems 135, 2022, pp. 364–381 DOI: 10.1016/j.future.2022.05.014
  276. “Dynamics-Aware Comparison of Learned Reward Functions” In Proceedings of International Conference on Learning Representations (ICLR), 2022 URL: https://openreview.net/forum?id=CALFyKVs87
  277. “FRESH: Interactive Reward Shaping in High-Dimensional State Spaces Using Human Feedback” In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS) IFAAMAS, 2020, pp. 1512–1520 DOI: 10.5555/3398761.3398935
  278. “Few-Shot Goal Inference for Visuomotor Learning and Planning” In Proceedings of the Conference on Robot Learning (CoRL) PMLR, 2018, pp. 40–52 URL: https://proceedings.mlr.press/v87/xie18a.html
  279. “ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation”, 2023 URL: https://openreview.net/forum?id=JVzeOYEx6d
  280. “Preference-Based Reinforcement Learning with Finite-Time Guarantees” In Advances in Neural Information Processing Systems (NeurIPS) 33 Curran Associates, Inc., 2020, pp. 18784–18794 URL: https://proceedings.neurips.cc/paper/2020/hash/d9d3837ee7981e8c064774da6cdd98bf-Abstract.html
  281. “Reinforcement Learning from Diverse Human Preferences”, 2023 arXiv:2301.11774
  282. “PrefRec: Recommender Systems with Human Preferences for Reinforcing Long-term User Engagement” In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) Association for Computing Machinery, 2023, pp. 2874–2884 DOI: 10.1145/3580305.3599473
  283. Georgios N. Yannakakis and Héctor P. Martínez “Ratings Are Overrated!” In Frontiers in ICT 2, 2015, pp. 13 DOI: 10.3389/fict.2015.00013
  284. “Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem” In Proceedings of the International Conference on Machine Learning (ICML) Association for Computing Machinery, 2009, pp. 1201–1208 DOI: 10.1145/1553374.1553527
  285. Huixin Zhan, Feng Tao and Yongcan Cao “Human-Guided Robot Behavior Learning: A GAN-Assisted Preference-Based Reinforcement Learning Approach” In IEEE Robotics and Automation Letters 6.2, 2021, pp. 3545–3552 DOI: 10.1109/LRA.2021.3063927
  286. “Provable Offline Reinforcement Learning with Human Feedback”, 2023 URL: https://openreview.net/forum?id=fffH7DRz9X
  287. “How to Query Human Feedback Efficiently in RL?”, 2023 URL: https://openreview.net/forum?id=kW6siW4EB6
  288. “Time-Efficient Reward Learning via Visually Assisted Cluster Ranking”, 2022 arXiv:2212.00169
  289. “Learning State Importance for Preference-Based Reinforcement Learning” In Machine Learning, 2023 DOI: 10.1007/s10994-022-06295-5
  290. “Recent Advances in Leveraging Human Guidance for Sequential Decision-Making Tasks” In Autonomous Agents and Multi-Agent Systems 35.2, 2021, pp. 31 DOI: 10.1007/s10458-021-09514-w
  291. “The Wisdom of Hindsight Makes Language Models Better Instruction Followers” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2023, pp. 41414–41428 URL: https://proceedings.mlr.press/v202/zhang23ab.html
  292. “SLiC-HF: Sequence Likelihood Calibration with Human Feedback”, 2023 arXiv:2305.10425
  293. “Learning to Compare for Better Training and Evaluation of Open Domain Natural Language Generation Models” In Proceedings of the AAAI Conference on Artificial Intelligence 34.05, 2020, pp. 9717–9724 DOI: 10.1609/aaai.v34i05.6521
  294. Banghua Zhu, Michael Jordan and Jiantao Jiao “Principled Reinforcement Learning with Human Feedback from Pairwise or K-wise Comparisons” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2023, pp. 43037–43067 URL: https://proceedings.mlr.press/v202/zhu23f.html
  295. “Maximum Entropy Inverse Reinforcement Learning” In Proceedings of the AAAI Conference on Artificial Intelligence AAAI Press, 2008, pp. 1433–1438 URL: https://cdn.aaai.org/AAAI/2008/AAAI08-227.pdf
  296. “Fine-Tuning Language Models from Human Preferences”, 2020 arXiv:1909.08593
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Timo Kaufmann (5 papers)
  2. Paul Weng (39 papers)
  3. Viktor Bengs (23 papers)
  4. Eyke Hüllermeier (129 papers)
Citations (82)