Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

OVGNet: A Unified Visual-Linguistic Framework for Open-Vocabulary Robotic Grasping (2407.13175v1)

Published 18 Jul 2024 in cs.RO

Abstract: Recognizing and grasping novel-category objects remains a crucial yet challenging problem in real-world robotic applications. Despite its significance, limited research has been conducted in this specific domain. To address this, we seamlessly propose a novel framework that integrates open-vocabulary learning into the domain of robotic grasping, empowering robots with the capability to adeptly handle novel objects. Our contributions are threefold. Firstly, we present a large-scale benchmark dataset specifically tailored for evaluating the performance of open-vocabulary grasping tasks. Secondly, we propose a unified visual-linguistic framework that serves as a guide for robots in successfully grasping both base and novel objects. Thirdly, we introduce two alignment modules designed to enhance visual-linguistic perception in the robotic grasping process. Extensive experiments validate the efficacy and utility of our approach. Notably, our framework achieves an average accuracy of 71.2\% and 64.4\% on base and novel categories in our new dataset, respectively.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Li Meng (16 papers)
  2. Zhao Qi (1 paper)
  3. Lyu Shuchang (1 paper)
  4. Wang Chunlei (1 paper)
  5. Ma Yujing (1 paper)
  6. Cheng Guangliang (1 paper)
  7. Yang Chenguang (1 paper)
Citations (1)

Summary

We haven't generated a summary for this paper yet.