Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning a Universal Human Prior for Dexterous Manipulation from Human Preference (2304.04602v2)

Published 10 Apr 2023 in cs.RO, cs.HC, and cs.LG

Abstract: Generating human-like behavior on robots is a great challenge especially in dexterous manipulation tasks with robotic hands. Scripting policies from scratch is intractable due to the high-dimensional control space, and training policies with reinforcement learning (RL) and manual reward engineering can also be hard and lead to unnatural motions. Leveraging the recent progress on RL from Human Feedback, we propose a framework that learns a universal human prior using direct human preference feedback over videos, for efficiently tuning the RL policies on 20 dual-hand robot manipulation tasks in simulation, without a single human demonstration. A task-agnostic reward model is trained through iteratively generating diverse polices and collecting human preference over the trajectories; it is then applied for regularizing the behavior of polices in the fine-tuning stage. Our method empirically demonstrates more human-like behaviors on robot hands in diverse tasks including even unseen tasks, indicating its generalization capability.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Zihan Ding (38 papers)
  2. Yuanpei Chen (28 papers)
  3. Allen Z. Ren (19 papers)
  4. Shixiang Shane Gu (34 papers)
  5. Qianxu Wang (5 papers)
  6. Hao Dong (175 papers)
  7. Chi Jin (90 papers)
Citations (6)