Understanding the Role of Scene Graphs in Visual Question Answering (2101.05479v2)
Abstract: Visual Question Answering (VQA) is of tremendous interest to the research community with important applications such as aiding visually impaired users and image-based search. In this work, we explore the use of scene graphs for solving the VQA task. We conduct experiments on the GQA dataset which presents a challenging set of questions requiring counting, compositionality and advanced reasoning capability, and provides scene graphs for a large number of images. We adopt image + question architectures for use with scene graphs, evaluate various scene graph generation techniques for unseen images, propose a training curriculum to leverage human-annotated and auto-generated scene graphs, and build late fusion architectures to learn from multiple image representations. We present a multi-faceted study into the use of scene graphs for VQA, making this work the first of its kind.
- Vinay Damodaran (5 papers)
- Sharanya Chakravarthy (1 paper)
- Akshay Kumar (38 papers)
- Anjana Umapathy (1 paper)
- Teruko Mitamura (26 papers)
- Yuta Nakashima (67 papers)
- Noa Garcia (33 papers)
- Chenhui Chu (48 papers)