Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning (2112.08587v1)

Published 16 Dec 2021 in cs.CV, cs.AI, cs.CL, cs.LG, and cs.MM

Abstract: Answering complex questions about images is an ambitious goal for machine intelligence, which requires a joint understanding of images, text, and commonsense knowledge, as well as a strong reasoning ability. Recently, multimodal Transformers have made great progress in the task of Visual Commonsense Reasoning (VCR), by jointly understanding visual objects and text tokens through layers of cross-modality attention. However, these approaches do not utilize the rich structure of the scene and the interactions between objects which are essential in answering complex commonsense questions. We propose a Scene Graph Enhanced Image-Text Learning (SGEITL) framework to incorporate visual scene graphs in commonsense reasoning. To exploit the scene graph structure, at the model structure level, we propose a multihop graph transformer for regularizing attention interaction among hops. As for pre-training, a scene-graph-aware pre-training method is proposed to leverage structure knowledge extracted in the visual scene graph. Moreover, we introduce a method to train and generate domain-relevant visual scene graphs using textual annotations in a weakly-supervised manner. Extensive experiments on VCR and other tasks show a significant performance boost compared with the state-of-the-art methods and prove the efficacy of each proposed component.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Zhecan Wang (18 papers)
  2. Haoxuan You (33 papers)
  3. Liunian Harold Li (19 papers)
  4. Alireza Zareian (16 papers)
  5. Suji Park (18 papers)
  6. Yiqing Liang (8 papers)
  7. Kai-Wei Chang (292 papers)
  8. Shih-Fu Chang (131 papers)
Citations (26)