Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 63 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 11 tok/s Pro
GPT-5 High 10 tok/s Pro
GPT-4o 83 tok/s Pro
Kimi K2 139 tok/s Pro
GPT OSS 120B 438 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

Towards Socially and Morally Aware RL agent: Reward Design With LLM (2401.12459v2)

Published 23 Jan 2024 in cs.AI

Abstract: When we design and deploy an Reinforcement Learning (RL) agent, reward functions motivates agents to achieve an objective. An incorrect or incomplete specification of the objective can result in behavior that does not align with human values - failing to adhere with social and moral norms that are ambiguous and context dependent, and cause undesired outcomes such as negative side effects and exploration that is unsafe. Previous work have manually defined reward functions to avoid negative side effects, use human oversight for safe exploration, or use foundation models as planning tools. This work studies the ability of leveraging LLMs (LLM)' understanding of morality and social norms on safe exploration augmented RL methods. This work evaluates LLM's result against human feedbacks and demonstrates LLM's capability as direct reward signals.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (11)
  1. Be considerate: Avoiding negative side effects in reinforcement learning. In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, AAMAS ’22, page 18–26, Richland, SC, 2022. International Foundation for Autonomous Agents and Multiagent Systems. ISBN 9781450392136.
  2. Concrete problems in AI safety. CoRR, abs/1606.06565, 2016. URL http://arxiv.org/abs/1606.06565.
  3. Social Norms. In E. N. Zalta and U. Nodelman, editors, The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, Winter 2023 edition, 2023.
  4. Cooperative inverse reinforcement learning, 2016.
  5. Mastering diverse domains through world models, 2023.
  6. Aligning {ai} with shared human values. In International Conference on Learning Representations, 2021a. URL https://openreview.net/forum?id=dNy_RKzJacY.
  7. What would jiminy cricket do? towards agents that behave morally. CoRR, abs/2110.13136, 2021b. URL https://arxiv.org/abs/2110.13136.
  8. Reward design with language models. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=10uNUgI5Kl.
  9. Trial without error: Towards safe reinforcement learning via human intervention. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’18, page 2067–2069, Richland, SC, 2018. International Foundation for Autonomous Agents and Multiagent Systems.
  10. Reinforcement learning: An introduction. MIT press, 2018.
  11. Read and reap the rewards: Learning to play atari with the help of instruction manuals. In Workshop on Reincarnating Reinforcement Learning at ICLR 2023, 2023. URL https://openreview.net/forum?id=I_GUngvVNz.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 post and received 0 likes.