Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Visual Question Answering Using Semantic Information from Image Descriptions (2004.10966v2)

Published 23 Apr 2020 in cs.CL, cs.AI, and cs.CV

Abstract: In this work, we propose a deep neural architecture that uses an attention mechanism which utilizes region based image features, the natural language question asked, and semantic knowledge extracted from the regions of an image to produce open-ended answers for questions asked in a visual question answering (VQA) task. The combination of both region based features and region based textual information about the image bolsters a model to more accurately respond to questions and potentially do so with less required training data. We evaluate our proposed architecture on a VQA task against a strong baseline and show that our method achieves excellent results on this task.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Tasmia Tasrin (4 papers)
  2. Md Sultan Al Nahian (10 papers)
  3. Brent Harrison (30 papers)