PREDILECT: Preferences Delineated with Zero-Shot Language-based Reasoning in Reinforcement Learning (2402.15420v1)

Published 23 Feb 2024 in cs.RO, cs.CL, and cs.LG

Abstract: Preference-based reinforcement learning (RL) has emerged as a new field in robot learning, where humans play a pivotal role in shaping robot behavior by expressing preferences on different sequences of state-action pairs. However, formulating realistic policies for robots demands responses from humans to an extensive array of queries. In this work, we approach the sample-efficiency challenge by expanding the information collected per query to contain both preferences and optional text prompting. To accomplish this, we leverage the zero-shot capabilities of a LLM to reason from the text provided by humans. To accommodate the additional query information, we reformulate the reward learning objectives to contain flexible highlights -- state-action pairs that contain relatively high information and are related to the features processed in a zero-shot fashion from a pretrained LLM. In both a simulated scenario and a user study, we reveal the effectiveness of our work by analyzing the feedback and its implications. Additionally, the collective feedback collected serves to train a robot on socially compliant trajectories in a simulated social navigation landscape. We provide video examples of the trained policies at https://sites.google.com/view/rl-predilect

PDF HTML Abstract

Summarize Bookmark Chat (Pro)

References (80)

Authors (3)

Simon Holk (3 papers)
Daniel Marta (3 papers)
Iolanda Leite (29 papers)

Citations (4)

View on Semantic Scholar

PREDILECT: Preferences Delineated with Zero-Shot Language-based Reasoning in Reinforcement Learning (2402.15420v1)

Related Papers