Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Open-vocabulary Pick and Place via Patch-level Semantic Maps (2406.15677v1)

Published 21 Jun 2024 in cs.RO

Abstract: Controlling robots through natural language instructions in open-vocabulary scenarios is pivotal for enhancing human-robot collaboration and complex robot behavior synthesis. However, achieving this capability poses significant challenges due to the need for a system that can generalize from limited data to a wide range of tasks and environments. Existing methods rely on large, costly datasets and struggle with generalization. This paper introduces Grounded Equivariant Manipulation (GEM), a novel approach that leverages the generative capabilities of pre-trained vision-LLMs and geometric symmetries to facilitate few-shot and zero-shot learning for open-vocabulary robot manipulation tasks. Our experiments demonstrate GEM's high sample efficiency and superior generalization across diverse pick-and-place tasks in both simulation and real-world experiments, showcasing its ability to adapt to novel instructions and unseen objects with minimal data requirements. GEM advances a significant step forward in the domain of language-conditioned robot control, bridging the gap between semantic understanding and action generation in robotic systems.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Mingxi Jia (11 papers)
  2. Haojie Huang (18 papers)
  3. Zhewen Zhang (3 papers)
  4. Chenghao Wang (19 papers)
  5. Linfeng Zhao (17 papers)
  6. Dian Wang (34 papers)
  7. Jason Xinyu Liu (7 papers)
  8. Robin Walters (73 papers)
  9. Robert Platt (70 papers)
  10. Stefanie Tellex (45 papers)
Citations (3)