Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Question-Guided Hybrid Convolution for Visual Question Answering (1808.02632v1)

Published 8 Aug 2018 in cs.CV, cs.AI, cs.CL, and cs.MM

Abstract: In this paper, we propose a novel Question-Guided Hybrid Convolution (QGHC) network for Visual Question Answering (VQA). Most state-of-the-art VQA methods fuse the high-level textual and visual features from the neural network and abandon the visual spatial information when learning multi-modal features.To address these problems, question-guided kernels generated from the input question are designed to convolute with visual features for capturing the textual and visual relationship in the early stage. The question-guided convolution can tightly couple the textual and visual information but also introduce more parameters when learning kernels. We apply the group convolution, which consists of question-independent kernels and question-dependent kernels, to reduce the parameter size and alleviate over-fitting. The hybrid convolution can generate discriminative multi-modal features with fewer parameters. The proposed approach is also complementary to existing bilinear pooling fusion and attention based VQA methods. By integrating with them, our method could further boost the performance. Extensive experiments on public VQA datasets validate the effectiveness of QGHC.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Peng Gao (402 papers)
  2. Pan Lu (42 papers)
  3. Hongsheng Li (340 papers)
  4. Shuang Li (203 papers)
  5. Yikang Li (64 papers)
  6. Steven Hoi (38 papers)
  7. Xiaogang Wang (230 papers)
Citations (68)

Summary

We haven't generated a summary for this paper yet.