3D-GRES: Generalized 3D Referring Expression Segmentation (2407.20664v2)

Published 30 Jul 2024 in cs.CV

Abstract: 3D Referring Expression Segmentation (3D-RES) is dedicated to segmenting a specific instance within a 3D space based on a natural language description. However, current approaches are limited to segmenting a single target, restricting the versatility of the task. To overcome this limitation, we introduce Generalized 3D Referring Expression Segmentation (3D-GRES), which extends the capability to segment any number of instances based on natural language instructions. In addressing this broader task, we propose the Multi-Query Decoupled Interaction Network (MDIN), designed to break down multi-object segmentation tasks into simpler, individual segmentations. MDIN comprises two fundamental components: Text-driven Sparse Queries (TSQ) and Multi-object Decoupling Optimization (MDO). TSQ generates sparse point cloud features distributed over key targets as the initialization for queries. Meanwhile, MDO is tasked with assigning each target in multi-object scenarios to different queries while maintaining their semantic consistency. To adapt to this new task, we build a new dataset, namely Multi3DRes. Our comprehensive evaluations on this dataset demonstrate substantial enhancements over existing models, thus charting a new path for intricate multi-object 3D scene comprehension. The benchmark and code are available at https://github.com/sosppxo/MDIN.

References (86)

Authors (9)

Changli Wu (5 papers)
Yihang Liu (16 papers)
Jiayi Ji (51 papers)
Yiwei Ma (24 papers)
Haowei Wang (32 papers)
Gen Luo (32 papers)
Henghui Ding (87 papers)
Xiaoshuai Sun (91 papers)
Rongrong Ji (315 papers)

Citations (4)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

GitHub

sosppxo/3D-GRES · GitHub

3D-GRES: Generalized 3D Referring Expression Segmentation (2407.20664v2)

Summary

Related Papers

GitHub