Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Instruction-driven history-aware policies for robotic manipulations (2209.04899v3)

Published 11 Sep 2022 in cs.RO, cs.AI, cs.CL, cs.CV, and cs.LG

Abstract: In human environments, robots are expected to accomplish a variety of manipulation tasks given simple natural language instructions. Yet, robotic manipulation is extremely challenging as it requires fine-grained motor control, long-term memory as well as generalization to previously unseen tasks and environments. To address these challenges, we propose a unified transformer-based approach that takes into account multiple inputs. In particular, our transformer architecture integrates (i) natural language instructions and (ii) multi-view scene observations while (iii) keeping track of the full history of observations and actions. Such an approach enables learning dependencies between history and instructions and improves manipulation precision using multiple views. We evaluate our method on the challenging RLBench benchmark and on a real-world robot. Notably, our approach scales to 74 diverse RLBench tasks and outperforms the state of the art. We also address instruction-conditioned tasks and demonstrate excellent generalization to previously unseen variations.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Pierre-Louis Guhur (6 papers)
  2. Shizhe Chen (52 papers)
  3. Ricardo Garcia (15 papers)
  4. Makarand Tapaswi (41 papers)
  5. Ivan Laptev (99 papers)
  6. Cordelia Schmid (206 papers)
Citations (89)

Summary

We haven't generated a summary for this paper yet.