Towards Socially and Morally Aware RL agent: Reward Design With LLM (2401.12459v2)
Abstract: When we design and deploy an Reinforcement Learning (RL) agent, reward functions motivates agents to achieve an objective. An incorrect or incomplete specification of the objective can result in behavior that does not align with human values - failing to adhere with social and moral norms that are ambiguous and context dependent, and cause undesired outcomes such as negative side effects and exploration that is unsafe. Previous work have manually defined reward functions to avoid negative side effects, use human oversight for safe exploration, or use foundation models as planning tools. This work studies the ability of leveraging LLMs (LLM)' understanding of morality and social norms on safe exploration augmented RL methods. This work evaluates LLM's result against human feedbacks and demonstrates LLM's capability as direct reward signals.
- Be considerate: Avoiding negative side effects in reinforcement learning. In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, AAMAS ’22, page 18–26, Richland, SC, 2022. International Foundation for Autonomous Agents and Multiagent Systems. ISBN 9781450392136.
- Concrete problems in AI safety. CoRR, abs/1606.06565, 2016. URL http://arxiv.org/abs/1606.06565.
- Social Norms. In E. N. Zalta and U. Nodelman, editors, The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, Winter 2023 edition, 2023.
- Cooperative inverse reinforcement learning, 2016.
- Mastering diverse domains through world models, 2023.
- Aligning {ai} with shared human values. In International Conference on Learning Representations, 2021a. URL https://openreview.net/forum?id=dNy_RKzJacY.
- What would jiminy cricket do? towards agents that behave morally. CoRR, abs/2110.13136, 2021b. URL https://arxiv.org/abs/2110.13136.
- Reward design with language models. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=10uNUgI5Kl.
- Trial without error: Towards safe reinforcement learning via human intervention. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’18, page 2067–2069, Richland, SC, 2018. International Foundation for Autonomous Agents and Multiagent Systems.
- Reinforcement learning: An introduction. MIT press, 2018.
- Read and reap the rewards: Learning to play atari with the help of instruction manuals. In Workshop on Reincarnating Reinforcement Learning at ICLR 2023, 2023. URL https://openreview.net/forum?id=I_GUngvVNz.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Collections
Sign up for free to add this paper to one or more collections.