Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Scene Graph Reasoning for Visual Question Answering (2007.01072v1)

Published 2 Jul 2020 in cs.LG, cs.CV, and stat.ML

Abstract: Visual question answering is concerned with answering free-form questions about an image. Since it requires a deep linguistic understanding of the question and the ability to associate it with various objects that are present in the image, it is an ambitious task and requires techniques from both computer vision and natural language processing. We propose a novel method that approaches the task by performing context-driven, sequential reasoning based on the objects and their semantic and spatial relationships present in the scene. As a first step, we derive a scene graph which describes the objects in the image, as well as their attributes and their mutual relationships. A reinforcement agent then learns to autonomously navigate over the extracted scene graph to generate paths, which are then the basis for deriving answers. We conduct a first experimental study on the challenging GQA dataset with manually curated scene graphs, where our method almost reaches the level of human performance.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Marcel Hildebrandt (12 papers)
  2. Hang Li (277 papers)
  3. Rajat Koner (14 papers)
  4. Volker Tresp (158 papers)
  5. Stephan Günnemann (169 papers)
Citations (56)