Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Where to Look: A Unified Attention Model for Visual Recognition with Reinforcement Learning (2111.07169v1)

Published 13 Nov 2021 in cs.CV

Abstract: The idea of using the recurrent neural network for visual attention has gained popularity in computer vision community. Although the recurrent attention model (RAM) leverages the glimpses with more large patch size to increasing its scope, it may result in high variance and instability. For example, we need the Gaussian policy with high variance to explore object of interests in a large image, which may cause randomized search and unstable learning. In this paper, we propose to unify the top-down and bottom-up attention together for recurrent visual attention. Our model exploits the image pyramids and Q-learning to select regions of interests in the top-down attention mechanism, which in turn to guide the policy search in the bottom-up approach. In addition, we add another two constraints over the bottom-up recurrent neural networks for better exploration. We train our model in an end-to-end reinforcement learning framework, and evaluate our method on visual classification tasks. The experimental results outperform convolutional neural networks (CNNs) baseline and the bottom-up recurrent attention models on visual classification tasks.

Citations (3)

Summary

We haven't generated a summary for this paper yet.