Papers
Topics
Authors
Recent
Search
2000 character limit reached

Improving Generalization in Visual Reasoning via Self-Ensemble

Published 28 Oct 2024 in cs.CV | (2410.20883v2)

Abstract: The cognitive faculty of visual reasoning necessitates the integration of multimodal perceptual processing and commonsense and external knowledge of the world. In recent years, a plethora of large vision-LLMs (LVLMs) have been proposed, demonstrating outstanding power and exceptional proficiency in commonsense reasoning across diverse domains and tasks. Nevertheless, training such LVLMs requires a lot of costly resources. Recent approaches, instead of training LVLMs from scratch on various large datasets, focus on exploring ways to take advantage of the capabilities of many different LVLMs, such as ensemble methods. In this work, we propose self-ensemble, a novel method that improves the generalization and visual reasoning of the model without updating any parameters, a training-free method. Our key insight is that we realized that LVLM itself can ensemble without the need for any other LVLMs, which helps to unlock their internal capabilities. Extensive experiments on various benchmarks demonstrate the effectiveness of our method in achieving state-of-the-art (SOTA) performance on SketchyVQA, Outside Knowledge VQA, and out-of-distribution VQA tasks.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.