Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning Sparse Mixture of Experts for Visual Question Answering (1909.09192v1)

Published 19 Sep 2019 in cs.LG, cs.CL, cs.CV, and stat.ML

Abstract: There has been a rapid progress in the task of Visual Question Answering with improved model architectures. Unfortunately, these models are usually computationally intensive due to their sheer size which poses a serious challenge for deployment. We aim to tackle this issue for the specific task of Visual Question Answering (VQA). A Convolutional Neural Network (CNN) is an integral part of the visual processing pipeline of a VQA model (assuming the CNN is trained along with entire VQA model). In this project, we propose an efficient and modular neural architecture for the VQA task with focus on the CNN module. Our experiments demonstrate that a sparsely activated CNN based VQA model achieves comparable performance to a standard CNN based VQA model architecture.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Vardaan Pahuja (14 papers)
  2. Jie Fu (229 papers)
  3. Christopher J. Pal (13 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.