Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Interpretable Visual Reasoning via Induced Symbolic Space (2011.11603v2)

Published 23 Nov 2020 in cs.CV, cs.AI, cs.CL, and cs.LG

Abstract: We study the problem of concept induction in visual reasoning, i.e., identifying concepts and their hierarchical relationships from question-answer pairs associated with images; and achieve an interpretable model via working on the induced symbolic concept space. To this end, we first design a new framework named object-centric compositional attention model (OCCAM) to perform the visual reasoning task with object-level visual features. Then, we come up with a method to induce concepts of objects and relations using clues from the attention patterns between objects' visual features and question words. Finally, we achieve a higher level of interpretability by imposing OCCAM on the objects represented in the induced symbolic concept space. Our model design makes this an easy adaption via first predicting the concepts of objects and relations and then projecting the predicted concepts back to the visual feature space so the compositional reasoning module can process normally. Experiments on the CLEVR and GQA datasets demonstrate: 1) our OCCAM achieves a new state of the art without human-annotated functional programs; 2) our induced concepts are both accurate and sufficient as OCCAM achieves an on-par performance on objects represented either in visual features or in the induced symbolic concept space.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Zhonghao Wang (20 papers)
  2. Kai Wang (624 papers)
  3. Mo Yu (117 papers)
  4. Jinjun Xiong (118 papers)
  5. Wen-mei Hwu (62 papers)
  6. Mark Hasegawa-Johnson (62 papers)
  7. Humphrey Shi (97 papers)
Citations (18)

Summary

We haven't generated a summary for this paper yet.