Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning Reward for Robot Skills Using Large Language Models via Self-Alignment (2405.07162v3)

Published 12 May 2024 in cs.RO and cs.AI

Abstract: Learning reward functions remains the bottleneck to equip a robot with a broad repertoire of skills. LLMs (LLM) contain valuable task-related knowledge that can potentially aid in the learning of reward functions. However, the proposed reward function can be imprecise, thus ineffective which requires to be further grounded with environment information. We proposed a method to learn rewards more efficiently in the absence of humans. Our approach consists of two components: We first use the LLM to propose features and parameterization of the reward, then update the parameters through an iterative self-alignment process. In particular, the process minimizes the ranking inconsistency between the LLM and the learnt reward functions based on the execution feedback. The method was validated on 9 tasks across 2 simulation environments. It demonstrates a consistent improvement over training efficacy and efficiency, meanwhile consuming significantly fewer GPT tokens compared to the alternative mutation-based method.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Yuwei Zeng (5 papers)
  2. Yao Mu (58 papers)
  3. Lin Shao (44 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.