Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Masked Image Modeling as a Framework for Self-Supervised Learning across Eye Movements (2404.08526v2)

Published 12 Apr 2024 in cs.CV

Abstract: To make sense of their surroundings, intelligent systems must transform complex sensory inputs to structured codes that are reduced to task-relevant information such as object category. Biological agents achieve this in a largely autonomous manner, presumably via self-supervised learning. Whereas previous attempts to model the underlying mechanisms were largely discriminative in nature, there is ample evidence that the brain employs a generative model of the world. Here, we propose that eye movements, in combination with the focused nature of primate vision, constitute a generative, self-supervised task of predicting and revealing visual information. We construct a proof-of-principle model starting from the framework of masked image modeling (MIM), a common approach in deep representation learning. To do so, we analyze how core components of MIM such as masking technique and data augmentation influence the formation of category-specific representations. This allows us not only to better understand the principles behind MIM, but to then reassemble a MIM more in line with the focused nature of biological perception. We find that MIM disentangles neurons in latent space without explicit regularization, a property that has been suggested to structure visual representations in primates. Together with previous findings of invariance learning, this highlights an interesting connection of MIM to latent regularization approaches for self-supervised learning. The source code is available under https://github.com/RobinWeiler/FocusMIM

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Yann LeCun “A path towards autonomous machine intelligence version 0.9.2, 2022-06-27” In Open Review 62.1, 2022
  2. Aaron van den Oord, Yazhe Li and Oriol Vinyals “Representation learning with contrastive predictive coding” In arXiv preprint arXiv:1807.03748, 2018
  3. Adrien Bardes, Jean Ponce and Yann LeCun “Vicreg: Variance-invariance-covariance regularization for self-supervised learning” In arXiv preprint arXiv:2105.04906, 2021
  4. “Bootstrap your own latent-a new approach to self-supervised learning” In Advances in neural information processing systems 33, 2020, pp. 21271–21284
  5. “Context encoders: Feature learning by inpainting” In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2536–2544
  6. “Masked autoencoders are scalable vision learners” In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16000–16009
  7. “Self-supervised learning from images with a joint-embedding predictive architecture” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 15619–15629
  8. Manu Srinath Halvagal and Friedemann Zenke “The combination of Hebbian and predictive plasticity learns invariant object representations in deep sensory networks” In Nature Neuroscience 26.11 Nature Publishing Group US New York, 2023, pp. 1906–1915
  9. “Local plasticity rules can learn deep representations using self-supervised contrastive predictions” In Advances in Neural Information Processing Systems 34, 2021, pp. 30365–30379
  10. Daniel J Simons and Christopher F Chabris “Gorillas in our midst: Sustained inattentional blindness for dynamic events” In perception 28.9 SAGE Publications Sage UK: London, England, 1999, pp. 1059–1074
  11. Trinity B Crapse and Marc A Sommer “The frontal eye field as a prediction map” In Progress in brain research 171 Elsevier, 2008, pp. 383–390
  12. Erich Von Holst and Horst Mittelstaedt “Das Reafferenzprinzip: Wechselwirkungen zwischen Zentralnervensystem und Peripherie” In Naturwissenschaften 37.20 Springer, 1950, pp. 464–476
  13. “The helmholtz machine” In Neural computation 7.5 MIT Press One Rogers Street, Cambridge, MA 02142-1209, USA journals-info …, 1995, pp. 889–904
  14. Rajesh PN Rao and Dana H Ballard “Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects” In Nature neuroscience 2.1 Nature Publishing Group, 1999, pp. 79–87
  15. Karl Friston “A theory of cortical responses” In Philosophical transactions of the Royal Society B: Biological sciences 360.1456 The Royal Society London, 2005, pp. 815–836
  16. Cyriel MA Pennartz “The brain’s representational power: on consciousness and the integration of modalities” MIT Press, 2015
  17. “Understanding Masked Autoencoders via Hierarchical Latent Variable Models” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 7918–7928
  18. Thomas Tsao and Doris Y Tsao “A topological solution to object segmentation and tracking” In Proceedings of the National Academy of Sciences 119.41 National Acad Sciences, 2022, pp. e2204248119
  19. “Learning to segment self-generated from externally caused optic flow through sensorimotor mismatch circuits” In bioRxiv preprint biorXiv:10.1101/2023.11.15.567170v2 Cold Spring Harbor Laboratory, 2023, pp. 2023–11
  20. “Efficient visual pretraining with contrastive detection” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10086–10096
  21. “Convnext v2: Co-designing and scaling convnets with masked autoencoders” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 16133–16142
  22. “Simmim: A simple framework for masked image modeling” In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 9653–9663
  23. “Backpropagation and the brain” In Nature Reviews Neuroscience 21.6 Nature Publishing Group UK London, 2020, pp. 335–346
  24. James CR Whittington and Rafal Bogacz “An approximation of the error backpropagation algorithm in a predictive coding network with local hebbian synaptic plasticity” In Neural computation 29.5 MIT Press One Rogers Street, Cambridge, MA 02142-1209, USA journals-info …, 2017, pp. 1229–1262
  25. “An information maximization model of eye movements” In Advances in neural information processing systems 17, 2004
  26. Hugo Larochelle and Geoffrey E Hinton “Learning to combine foveal glimpses with a third-order Boltzmann machine” In Advances in neural information processing systems 23, 2010
  27. Jessica AF Thompson, Hannah Sheahan and Christopher Summerfield “Learning to count visual objects by combining” what” and” where” in recurrent memory” In NeuRIPS 2022 Workshop on Gaze Meets ML, 2022, pp. 199–218
  28. “Estimating receptive field size from fMRI data in human striate and extrastriate visual cortex” In Cerebral cortex 11.12 Oxford University Press, 2001, pp. 1182–1190
  29. “Deep residual learning for image recognition” In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778
  30. Adam Coates, Andrew Ng and Honglak Lee “An analysis of single-layer networks in unsupervised feature learning” In Proceedings of the fourteenth international conference on artificial intelligence and statistics, 2011, pp. 215–223 JMLR WorkshopConference Proceedings
  31. “Neuroscience, 3rd Edition” Sinauer Associates, 2004
  32. Jeffrey S Perry and Wilson S Geisler “Gaze-contingent real-time simulation of arbitrary visual fields” In Human vision and electronic imaging VII 4662, 2002, pp. 57–69 SPIE
  33. “Salicon: Saliency in context” In Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1072–1080
  34. Diederik P Kingma and Jimmy Ba “Adam: A method for stochastic optimization” In arXiv preprint arXiv:1412.6980, 2014
  35. Horace B Barlow “Possible principles underlying the transformation of sensory messages” In Sensory communication 1.01, 1961, pp. 217–233
  36. “Barlow twins: Self-supervised learning via redundancy reduction” In International conference on machine learning, 2021, pp. 12310–12320 PMLR
  37. “Eye movements in natural behavior” In Trends in cognitive sciences 9.4 Elsevier, 2005, pp. 188–194
  38. Alfred L Yarbus “Eye movements and vision” Plenum Press, 1967
  39. Bruno A Olshausen and David J Field “Emergence of simple-cell receptive field properties by learning a sparse code for natural images” In Nature 381.6583 Nature Publishing Group UK London, 1996, pp. 607–609
  40. R Houtkamp, H Spekreijse and PR Roelfsema “A gradual spread of attention” In Perception & psychophysics 65 Springer, 2003, pp. 1136–1144
  41. “Understanding masked image modeling via learning occlusion invariant feature” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 6241–6251
  42. Sindy Löwe, Peter O’Connor and Bastiaan Veeling “Putting an end to end-to-end: Gradient-isolated learning of representations” In Advances in neural information processing systems 32, 2019
  43. “Blockwise self-supervised learning at scale” In arXiv preprint arXiv:2302.01647, 2023
  44. Christoph Feichtenhofer, Yanghao Li and Kaiming He “Masked autoencoders as spatiotemporal learners” In Advances in neural information processing systems 35, 2022, pp. 35946–35958

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com