Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Supplementing Missing Visions via Dialog for Scene Graph Generations (2204.11143v2)

Published 23 Apr 2022 in cs.CV

Abstract: Most current AI systems rely on the premise that the input visual data are sufficient to achieve competitive performance in various computer vision tasks. However, the classic task setup rarely considers the challenging, yet common practical situations where the complete visual data may be inaccessible due to various reasons (e.g., restricted view range and occlusions). To this end, we investigate a computer vision task setting with incomplete visual input data. Specifically, we exploit the Scene Graph Generation (SGG) task with various levels of visual data missingness as input. While insufficient visual input intuitively leads to performance drop, we propose to supplement the missing visions via the natural language dialog interactions to better accomplish the task objective. We design a model-agnostic Supplementary Interactive Dialog (SI-Dial) framework that can be jointly learned with most existing models, endowing the current AI systems with the ability of question-answer interactions in natural language. We demonstrate the feasibility of such a task setting with missing visual input and the effectiveness of our proposed dialog module as the supplementary information source through extensive experiments and analysis, by achieving promising performance improvement over multiple baselines.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. “Scene graph generation by iterative message passing,” in CVPR, 2017.
  2. “Neural motifs: Scene graph parsing with global context,” in CVPR, 2018.
  3. “Graph r-cnn for scene graph generation,” in ECCV, 2018.
  4. “Scene graph generation with external knowledge and image reconstruction,” in CVPR, 2019.
  5. “Counterfactual critic multi-agent training for scene graph generation,” in ICCV, 2019.
  6. “Scene graph generation from objects, phrases and region captions,” in ICCV, 2017.
  7. “Graphical contrastive losses for scene graph parsing,” in ICCV, 2019.
  8. “Visual dialog,” in CVPR, 2017.
  9. “Guesswhat?! visual object discovery through multi-modal dialogue,” in CVPR, 2017.
  10. “Recursive visual attention in visual dialog,” in CVPR, 2019.
  11. “Learning cooperative visual dialog agents with deep reinforcement learning,” in ICCV, 2017.
  12. “Visual reference resolution using attention memory for visual dialog,” in NeurIPS, 2017.
  13. “A study of face obfuscation in imagenet,” ICML, 2022.
  14. “Audio visual scene-aware dialog,” in CVPR, 2019.
  15. “Vqa: Visual question answering,” in ICCV, 2015.
  16. “Stacked attention networks for image question answering,” in CVPR, 2016.
  17. “Dynamic memory networks for visual and textual question answering,” in ICML, 2016.
  18. “Hierarchical question-image co-attention for visual question answering,” in NeurIPS, 2016.
  19. “Learning to compose dynamic tree structures for visual contexts,” in CVPR, 2019.
  20. “Saying the unseen: Video descriptions via dialog agents,” in TPAMI, 2021.
  21. “Sentence-bert: Sentence embeddings using siamese bert-networks,” in EMNLP, 2019.
  22. “Making monolingual sentence embeddings multilingual using knowledge distillation,” in EMNLP, 2020.
  23. “Describing unseen videos via multi-modal cooperative dialog agents,” in ECCV, 2020.
  24. “Factor graph attention,” in CVPR, 2019.
  25. “Unbiased scene graph generation from biased training,” in CVPR, 2020.
  26. “Visual genome: Connecting language and vision using crowdsourced dense image annotations,” IJCV, 2017.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Zhenghao Zhao (3 papers)
  2. Ye Zhu (75 papers)
  3. Xiaoguang Zhu (11 papers)
  4. Yuzhang Shang (35 papers)
  5. Yan Yan (242 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.