Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Recurrent Attention Models with Object-centric Capsule Representation for Multi-object Recognition (2110.04954v1)

Published 11 Oct 2021 in cs.CV, cs.LG, and q-bio.NC

Abstract: The visual system processes a scene using a sequence of selective glimpses, each driven by spatial and object-based attention. These glimpses reflect what is relevant to the ongoing task and are selected through recurrent processing and recognition of the objects in the scene. In contrast, most models treat attention selection and recognition as separate stages in a feedforward process. Here we show that using capsule networks to create an object-centric hidden representation in an encoder-decoder model with iterative glimpse attention yields effective integration of attention and recognition. We evaluate our model on three multi-object recognition tasks; highly overlapping digits, digits among distracting clutter and house numbers, and show that it learns to effectively move its glimpse window, recognize and reconstruct the objects, all with only the classification as supervision. Our work takes a step toward a general architecture for how to integrate recurrent object-centric representation into the planning of attentional glimpses.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Hossein Adeli (7 papers)
  2. Seoyoung Ahn (10 papers)
  3. Gregory Zelinsky (11 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.