Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Generating Context-Aware Natural Answers for Questions in 3D Scenes (2310.19516v1)

Published 30 Oct 2023 in cs.CV

Abstract: 3D question answering is a young field in 3D vision-language that is yet to be explored. Previous methods are limited to a pre-defined answer space and cannot generate answers naturally. In this work, we pivot the question answering task to a sequence generation task to generate free-form natural answers for questions in 3D scenes (Gen3DQA). To this end, we optimize our model directly on the language rewards to secure the global sentence semantics. Here, we also adapt a pragmatic language understanding reward to further improve the sentence quality. Our method sets a new SOTA on the ScanQA benchmark (CIDEr score 72.22/66.57 on the test sets).

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Mohammed Munzer Dwedari (2 papers)
  2. Dave Zhenyu Chen (12 papers)
  3. Matthias Niessner (18 papers)
Citations (1)