Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Reasoning Visual Dialogs with Structural and Partial Observations (1904.05548v2)

Published 11 Apr 2019 in cs.CV, cs.AI, cs.CL, and cs.LG

Abstract: We propose a novel model to address the task of Visual Dialog which exhibits complex dialog structures. To obtain a reasonable answer based on the current question and the dialog history, the underlying semantic dependencies between dialog entities are essential. In this paper, we explicitly formalize this task as inference in a graphical model with partially observed nodes and unknown graph structures (relations in dialog). The given dialog entities are viewed as the observed nodes. The answer to a given question is represented by a node with missing value. We first introduce an Expectation Maximization algorithm to infer both the underlying dialog structures and the missing node values (desired answers). Based on this, we proceed to propose a differentiable graph neural network (GNN) solution that approximates this process. Experiment results on the VisDial and VisDial-Q datasets show that our model outperforms comparative methods. It is also observed that our method can infer the underlying dialog structure for better dialog reasoning.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Zilong Zheng (63 papers)
  2. Wenguan Wang (103 papers)
  3. Siyuan Qi (34 papers)
  4. Song-Chun Zhu (216 papers)
Citations (117)

Summary

We haven't generated a summary for this paper yet.