Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning to Shape Rewards using a Game of Two Partners (2103.09159v5)

Published 16 Mar 2021 in cs.LG, cs.AI, and cs.GT

Abstract: Reward shaping (RS) is a powerful method in reinforcement learning (RL) for overcoming the problem of sparse or uninformative rewards. However, RS typically relies on manually engineered shaping-reward functions whose construction is time-consuming and error-prone. It also requires domain knowledge which runs contrary to the goal of autonomous learning. We introduce Reinforcement Learning Optimising Shaping Algorithm (ROSA), an automated reward shaping framework in which the shaping-reward function is constructed in a Markov game between two agents. A reward-shaping agent (Shaper) uses switching controls to determine which states to add shaping rewards for more efficient learning while the other agent (Controller) learns the optimal policy for the task using these shaped rewards. We prove that ROSA, which adopts existing RL algorithms, learns to construct a shaping-reward function that is beneficial to the task thus ensuring efficient convergence to high performance policies. We demonstrate ROSA's properties in three didactic experiments and show its superior performance against state-of-the-art RS algorithms in challenging sparse reward environments.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (12)
  1. David Mguni (23 papers)
  2. Taher Jafferjee (7 papers)
  3. Jianhong Wang (24 papers)
  4. Nicolas Perez-Nieves (6 papers)
  5. Tianpei Yang (25 papers)
  6. Matthew Taylor (12 papers)
  7. Wenbin Song (6 papers)
  8. Feifei Tong (4 papers)
  9. Hui Chen (298 papers)
  10. Jiangcheng Zhu (14 papers)
  11. Jun Wang (990 papers)
  12. Yaodong Yang (169 papers)
Citations (6)