Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Automated Rewards via LLM-Generated Progress Functions (2410.09187v2)

Published 11 Oct 2024 in cs.LG, cs.AI, and cs.CL

Abstract: LLMs have the potential to automate reward engineering by leveraging their broad domain knowledge across various tasks. However, they often need many iterations of trial-and-error to generate effective reward functions. This process is costly because evaluating every sampled reward function requires completing the full policy optimization process for each function. In this paper, we introduce an LLM-driven reward generation framework that is able to produce state-of-the-art policies on the challenging Bi-DexHands benchmark with 20x fewer reward function samples than the prior state-of-the-art work. Our key insight is that we reduce the problem of generating task-specific rewards to the problem of coarsely estimating task progress. Our two-step solution leverages the task domain knowledge and the code synthesis abilities of LLMs to author progress functions that estimate task progress from a given state. Then, we use this notion of progress to discretize states, and generate count-based intrinsic rewards using the low-dimensional state space. We show that the combination of LLM-generated progress functions and count-based intrinsic rewards is essential for our performance gains, while alternatives such as generic hash-based counts or using progress directly as a reward function fall short.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Vishnu Sarukkai (7 papers)
  2. Brennan Shacklett (6 papers)
  3. Zander Majercik (5 papers)
  4. Kush Bhatia (25 papers)
  5. Christopher RĂ© (194 papers)
  6. Kayvon Fatahalian (27 papers)

Summary

We haven't generated a summary for this paper yet.