Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Code as Reward: Empowering Reinforcement Learning with VLMs (2402.04764v1)

Published 7 Feb 2024 in cs.LG

Abstract: Pre-trained Vision-LLMs (VLMs) are able to understand visual concepts, describe and decompose complex tasks into sub-tasks, and provide feedback on task completion. In this paper, we aim to leverage these capabilities to support the training of reinforcement learning (RL) agents. In principle, VLMs are well suited for this purpose, as they can naturally analyze image-based observations and provide feedback (reward) on learning progress. However, inference in VLMs is computationally expensive, so querying them frequently to compute rewards would significantly slowdown the training of an RL agent. To address this challenge, we propose a framework named Code as Reward (VLM-CaR). VLM-CaR produces dense reward functions from VLMs through code generation, thereby significantly reducing the computational burden of querying the VLM directly. We show that the dense rewards generated through our approach are very accurate across a diverse set of discrete and continuous environments, and can be more effective in training RL policies than the original sparse environment rewards.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. David Venuto (8 papers)
  2. Sami Nur Islam (2 papers)
  3. Martin Klissarov (11 papers)
  4. Doina Precup (206 papers)
  5. Sherry Yang (16 papers)
  6. Ankit Anand (41 papers)
Citations (4)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets