Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning to predict where to look in interactive environments using deep recurrent q-learning (1612.05753v2)

Published 17 Dec 2016 in cs.CV and cs.LG

Abstract: Bottom-Up (BU) saliency models do not perform well in complex interactive environments where humans are actively engaged in tasks (e.g., sandwich making and playing the video games). In this paper, we leverage Reinforcement Learning (RL) to highlight task-relevant locations of input frames. We propose a soft attention mechanism combined with the Deep Q-Network (DQN) model to teach an RL agent how to play a game and where to look by focusing on the most pertinent parts of its visual input. Our evaluations on several Atari 2600 games show that the soft attention based model could predict fixation locations significantly better than bottom-up models such as Itti-Kochs saliency and Graph-Based Visual Saliency (GBVS) models.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Sajad Mousavi (26 papers)
  2. Michael Schukat (9 papers)
  3. Enda Howley (12 papers)
  4. Ali Borji (89 papers)
  5. Nasser Mozayani (8 papers)
Citations (29)