Papers
Topics
Authors
Recent
2000 character limit reached

Foveated Retinotopy Improves Classification and Localization in CNNs

Published 23 Feb 2024 in cs.CV and q-bio.NC | (2402.15480v3)

Abstract: From a falcon detecting prey to humans recognizing faces, many species exhibit extraordinary abilities in rapid visual localization and classification. These are made possible by a specialized retinal region called the fovea, which provides high acuity at the center of vision while maintaining lower resolution in the periphery. This distinctive spatial organization, preserved along the early visual pathway through retinotopic mapping, is fundamental to biological vision, yet remains largely unexplored in machine learning. Our study investigates how incorporating foveated retinotopy may benefit deep convolutional neural networks (CNNs) in image classification tasks. By implementing a foveated retinotopic transformation in the input layer of standard ResNet models and re-training them, we maintain comparable classification accuracy while enhancing the network's robustness to scale and rotational perturbations. Although this architectural modification introduces increased sensitivity to fixation point shifts, we demonstrate how this apparent limitation becomes advantageous: variations in classification probabilities across different gaze positions serve as effective indicators for object localization. Our findings suggest that foveated retinotopic mapping encodes implicit knowledge about visual object geometry, offering an efficient solution to the visual search problem - a capability crucial for many living species.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. Speeding up the log-polar transform with inexpensive parallel hardware: graphics units and multi-core architectures. Journal of Real-Time Image Processing. 2015; 10(3):533–550. doi: 10.1007/s11554-012-0281-6, 00000.
  2. Araujo H, Dias JM. An introduction to the log-polar mapping. Proceedings II Workshop on Cybernetic Vision. 1997; (1):139–144. http://ieeexplore.ieee.org/document/629454/, doi: 10.1109/CYBVIS.1996.629454, 00000.
  3. http://biorxiv.org/lookup/doi/10.1101/2021.06.18.448989, doi: 10.1101/2021.06.18.448989.
  4. Crouzet SM. What Are the Visual Features Underlying Rapid Object Recognition? Frontiers in Psychology. 2011; 2.
  5. Fabre-Thorpe M. The Characteristics and Limits of Rapid Visual Categorization. Frontiers in Psychology. 2011; 2.
  6. Adversarial Attacks on Neural Network Policies. . 2017 Feb; http://arxiv.org/abs/1702.02284, doi: 10.48550/arXiv.1702.02284, arXiv:1702.02284 [cs, stat].
  7. Spatial Transformer Networks. . 2016 Feb; http://arxiv.org/abs/1506.02025, doi: 10.48550/arXiv.1506.02025, arXiv:1506.02025 [cs].
  8. Jérémie JN, Perrinet LU. Ultrafast Image Categorization in Biology and Neural Models. Vision. 2023; 2.
  9. Kaas JH. Topographic Maps are Fundamental to Sensory Processing. Brain Research Bulletin. 1997 Jan; 44(2):107–112. https://www.sciencedirect.com/science/article/pii/S0361923097000944, doi: 10.1016/S0361-9230(97)00094-4.
  10. Near-optimal combination of disparity across a log-polar scaled visual field. PLoS Computational Biology. 2020; 16(4):e1007699.
  11. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In: Wallach H, Larochelle H, Beygelzimer A, dAlché-Buc F, Fox E, Garnett R, editors. Advances in Neural Information Processing Systems 32 Curran Associates, Inc.; 2019.p. 8024–8035.
  12. Polyak SL. The retina. . 1941; .
  13. Is It an Animal? Is It a Human Face? Fast Processing in Upright and Inverted Natural Scenes. Journal of Vision. 2003; 3:440–455.
  14. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV). 2015; 115:211–252.
  15. Image registration using log-polar transform and phase correlation. In: IEEE Region 10 Annual International Conference, Proceedings/TENCON; 2009. p. 1–5. doi: 10.1109/TENCON.2009.5396234, tex.ids= Sarvaiya2009a.
  16. Simonyan K, Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:14091556 [cs]. 2015; .
  17. Intriguing properties of neural networks. arXiv preprint arXiv:13126199. 2013; .
  18. Traver Roig VJ, Bernardino A. A review of log-polar imaging for visual perception in robotics. . 2010; .
  19. Weinberg RJ. Are Topographic Maps Fundamental to Sensory Processing? Brain Research Bulletin. 1997 Jan; 44(2):113–116. https://www.sciencedirect.com/science/article/pii/S0361923097000956, doi: 10.1016/S0361-9230(97)00095-6.
  20. Yarbus A. Eye Movements during the Examination of Complicated Objects. Biofizika. 1961; 6(2):52–56.
Citations (1)

Summary

  • The paper demonstrates that integrating retinotopic mapping into CNNs improves robustness against geometric distortions such as rotations and zooms.
  • It leverages a logarithmic-polar transform to simulate foveated vision, converting complex image transformations into simpler translations.
  • Empirical tests on Vgg16 and ResNet101 reveal that the approach yields comparable or improved classification and localization performance.

Enhancing Convolutional Neural Networks with Retinotopic Mapping

Introduction to Retinotopic Mapping in CNNs

In an exploration of improving image classification and localization tasks, a novel approach integrating retinotopic mapping into Convolutional Neural Networks (CNNs) is presented. This study focused on adjusting CNNs' input phase with a retinotopic transform—a feature inspired by the foveated vision common in many animal species. The methodology involved retraining widely recognized CNN architectures with images transformed via logarithmic-polar mapping, aiming to examine the impact on the networks' ability to handle image rotations and scale changes.

Core Findings and Experimental Insights

Robustness to Geometric Perturbations

A striking outcome of the study is the demonstration that CNNs, when adjusted with retinotopic mapping, exhibit enhanced robustness against geometric transformations of input images, specifically zooms and rotations. This finding contrasts with the common vulnerability of traditional CNNs to adversarial attacks, highlighting the potential of retinotopic mapping in crafting more resilient image recognition systems. These vanilla CNNs were shown to significantly falter under rotation-based attacks, underlining the importance of considering realistic perturbations in model evaluations.

Advantages of Biologically Inspired Mapping

The research underpins the computational benefits of biologically inspired visual processing models. By mimicking the human visual system's retinotopic organization, where a logarithmic scale dictates the distribution of photoreceptors in the retina, the transformed input to CNNs showcases an inherent invariance to scale and rotation. This is attributed to the property of the log-polar transform, which converts zooms and rotations in the visual space into simpler translations in the transformed space, thereby leveraging the convolutional layers' translation invariance in a novel manner.

Empirical Validation and Theoretical Implications

The experimental validation involved retraining the Vgg16 and ResNet101 networks on the ImageNet dataset with the retinotopic transformation engaged. Remarkably, these retinotopically mapped networks achieved comparable, and in some dimensions improved, performance against their traditionally trained counterparts. This result is non-trivial, as it suggests that despite significant distortion and information compression inherent in the retinotopic mapping, networks can still extract and utilize relevant features for accurate classification. Additionally, the study delved into object localization capabilities, finding improved localization performance when analyzing images with varying focal points. This mirrors a critical aspect of human visual attention not previously encapsulated by standard CNN designs.

Future Directions in AI and Robust Vision Systems

The integration of retinotopic mapping opens up new avenues for research in geometric deep learning and AI vision systems. It suggests a pathway towards developing models that not only match but potentially surpass human visual recognition capabilities under varied and challenging conditions. The study's insights into the naturally robust preattentive processing mechanisms present an exciting frontier for improving AI systems' efficiency and resilience.

Moreover, this approach could inspire future models that incorporate dynamic foveation mechanisms, mimicking saccadic eye movements to iteratively focus on regions of interest within an image. Such models could vastly improve the computational efficiency and accuracy of tasks like visual search, further bridging the gap between artificial and biological vision systems.

In conclusion, the exploration of retinotopic mapping within CNNs marks a significant step towards harnessing biologically inspired principles to bolster AI robustness and capability. It not only demonstrates the practical advantages of such an approach in current architectures but also sets the stage for profound advancements in how machines perceive and understand the visual world.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 4 tweets with 7 likes about this paper.