Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Free-form Description Guided 3D Visual Graph Network for Object Grounding in Point Cloud (2103.16381v1)

Published 30 Mar 2021 in cs.CV

Abstract: 3D object grounding aims to locate the most relevant target object in a raw point cloud scene based on a free-form language description. Understanding complex and diverse descriptions, and lifting them directly to a point cloud is a new and challenging topic due to the irregular and sparse nature of point clouds. There are three main challenges in 3D object grounding: to find the main focus in the complex and diverse description; to understand the point cloud scene; and to locate the target object. In this paper, we address all three challenges. Firstly, we propose a language scene graph module to capture the rich structure and long-distance phrase correlations. Secondly, we introduce a multi-level 3D proposal relation graph module to extract the object-object and object-scene co-occurrence relationships, and strengthen the visual features of the initial proposals. Lastly, we develop a description guided 3D visual graph module to encode global contexts of phrases and proposals by a nodes matching strategy. Extensive experiments on challenging benchmark datasets (ScanRefer and Nr3D) show that our algorithm outperforms existing state-of-the-art. Our code is available at https://github.com/PNXD/FFL-3DOG.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Mingtao Feng (23 papers)
  2. Zhen Li (334 papers)
  3. Qi Li (354 papers)
  4. Liang Zhang (357 papers)
  5. Guangming Zhu (17 papers)
  6. Hui Zhang (405 papers)
  7. Yaonan Wang (51 papers)
  8. Ajmal Mian (136 papers)
  9. Xiangdong Zhang (151 papers)
Citations (72)

Summary

We haven't generated a summary for this paper yet.