Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Beyond Bare Queries: Open-Vocabulary Object Grounding with 3D Scene Graph (2406.07113v3)

Published 11 Jun 2024 in cs.CV and cs.AI

Abstract: Locating objects described in natural language presents a significant challenge for autonomous agents. Existing CLIP-based open-vocabulary methods successfully perform 3D object grounding with simple (bare) queries, but cannot cope with ambiguous descriptions that demand an understanding of object relations. To tackle this problem, we propose a modular approach called BBQ (Beyond Bare Queries), which constructs 3D scene graph representation with metric and semantic edges and utilizes a LLM as a human-to-agent interface through our deductive scene reasoning algorithm. BBQ employs robust DINO-powered associations to construct 3D object-centric map and an advanced raycasting algorithm with a 2D vision-LLM to describe them as graph nodes. On the Replica and ScanNet datasets, we have demonstrated that BBQ takes a leading place in open-vocabulary 3D semantic segmentation compared to other zero-shot methods. Also, we show that leveraging spatial relations is especially effective for scenes containing multiple entities of the same semantic class. On challenging Sr3D+, Nr3D and ScanRefer benchmarks, our deductive approach demonstrates a significant improvement, enabling objects grounding by complex queries compared to other state-of-the-art methods. The combination of our design choices and software implementation has resulted in significant data processing speed in experiments on the robot on-board computer. This promising performance enables the application of our approach in intelligent robotics projects. We made the code publicly available at https://linukc.github.io/BeyondBareQueries/.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Sergey Linok (2 papers)
  2. Tatiana Zemskova (5 papers)
  3. Svetlana Ladanova (1 paper)
  4. Roman Titkov (2 papers)
  5. Dmitry Yudin (47 papers)
  6. Maxim Monastyrny (1 paper)
  7. Aleksei Valenkov (1 paper)
Citations (1)
Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com