Language Reward Modulation for Pretraining Reinforcement Learning (2308.12270v1)

Published 23 Aug 2023 in cs.LG and cs.AI

Abstract: Using learned reward functions (LRFs) as a means to solve sparse-reward reinforcement learning (RL) tasks has yielded some steady progress in task-complexity through the years. In this work, we question whether today's LRFs are best-suited as a direct replacement for task rewards. Instead, we propose leveraging the capabilities of LRFs as a pretraining signal for RL. Concretely, we propose $\textbf{LA}$nguage Reward $\textbf{M}$odulated $\textbf{P}$retraining (LAMP) which leverages the zero-shot capabilities of Vision-LLMs (VLMs) as a $\textit{pretraining}$ utility for RL as opposed to a downstream task reward. LAMP uses a frozen, pretrained VLM to scalably generate noisy, albeit shaped exploration rewards by computing the contrastive alignment between a highly diverse collection of language instructions and the image observations of an agent in its pretraining environment. LAMP optimizes these rewards in conjunction with standard novelty-seeking exploration rewards with reinforcement learning to acquire a language-conditioned, pretrained policy. Our VLM pretraining approach, which is a departure from previous attempts to use LRFs, can warmstart sample-efficient learning on robot manipulation tasks in RLBench.

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (6)

Ademi Adeniji (6 papers)
Amber Xie (9 papers)
Carmelo Sferrazza (22 papers)
Younggyo Seo (25 papers)
Stephen James (42 papers)
Pieter Abbeel (372 papers)

Citations (22)

View on Semantic Scholar

GitHub

GitHub - ademiadeniji/lamp (45 stars)

Language Reward Modulation for Pretraining Reinforcement Learning (2308.12270v1)

Related Papers

GitHub