Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PhD Thesis: Exploring the role of (self-)attention in cognitive and computer vision architecture (2306.14650v2)

Published 26 Jun 2023 in cs.AI, cs.CV, cs.LG, and cs.SC

Abstract: We investigate the role of attention and memory in complex reasoning tasks. We analyze Transformer-based self-attention as a model and extend it with memory. By studying a synthetic visual reasoning test, we refine the taxonomy of reasoning tasks. Incorporating self-attention with ResNet50, we enhance feature maps using feature-based and spatial attention, achieving efficient solving of challenging visual reasoning tasks. Our findings contribute to understanding the attentional needs of SVRT tasks. Additionally, we propose GAMR, a cognitive architecture combining attention and memory, inspired by active vision theory. GAMR outperforms other architectures in sample efficiency, robustness, and compositionality, and shows zero-shot generalization on new reasoning tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (193)
  1. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008, 2017.
  2. An image is worth 16x16 words: Transformers for image recognition at scale. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021. URL https://openreview.net/forum?id=YicbFdNTTy.
  3. Comparing machines and humans on a visual categorization test. Proceedings of the National Academy of Sciences, 108(43):17621–17625, 2011.
  4. A taxonomy of external and internal attention. Annual review of psychology, 62(1):73–101, 2011.
  5. Describing multimedia content using attention-based encoder-decoder networks. IEEE Transactions on Multimedia, 17(11):1875–1886, 2015.
  6. Les passions de l’âme. Le Gras, Paris, 1649.
  7. H. von Helmholtz. Physiological optics. 1896 - 2nd German Edition, translated by M. Mackeben, from Nakayama and Mackeben, 29(11):1631–1647, 1989.
  8. Wolfgang Köhler. Gestalt psychology today. 1947.
  9. I. M Sechenov. Reflexes of the brain (trad. s. belsky). 1863/1965.
  10. Modeling visual attention via selective tuning. Artificial intelligence, 78(1-2):507–545, 1995.
  11. I. P. Pavlov. Conditioned reflexes. 1927.
  12. Bayesian surprise attracts human attention. Vision research, 49(10):1295–1306, 2009.
  13. A feature-integration theory of attention. Cognitive psychology, 12(1):97–136, 1980.
  14. Donald Eric Broadbent. Perception and communication. Pergamon Press, New York, 1958.
  15. Donald A Norman. Toward a theory of memory and attention. Psychological review, 75(6):522, 1968.
  16. Peter M Milner. A model for visual shape recognition. Psychological review, 81(6):521, 1974.
  17. Stephen Grossberg. A neural model of attention, reinforcement and discrimination learning. International review of neurobiology, 18:263–327, 1975.
  18. Parallel versus serial processing in rapid pattern discrimination. Nature, 303(5919):696–698, 1983.
  19. Shifts in selective visual attention: towards the underlying neural circuitry. Matters of intelligence: Conceptual structures in cognitive neuroscience, pages 115–141, 1987.
  20. Attention: Some theoretical considerations. Psychological review, 70(1):80, 1963.
  21. Anne M Treisman. Selective attention in man. British medical bulletin, 1964.
  22. Perceptual load as a major determinant of the locus of selection in visual attention. Perception & psychophysics, 56:183–197, 1994.
  23. Diluting the burden of load: perceptual load effects are simply dilution effects. Journal of Experimental Psychology: Human Perception and Performance, 36(6):1645, 2010.
  24. The attention system of the human brain. Annual review of neuroscience, 13(1):25–42, 1990.
  25. On the control of automatic processes: a parallel distributed processing account of the stroop effect. Psychological review, 97(3):332, 1990.
  26. Slam: A connectionist model for attention in visual selection tasks. Cognitive psychology, 22(3):273–341, 1990.
  27. Claus Bundesen. A theory of visual attention. Psychological review, 97(4):523, 1990.
  28. Neural mechanisms of selective visual attention. Annual review of neuroscience, 18(1):193–222, 1995.
  29. Computational modeling of spatial attention. Attention, 9:341–393, 1998.
  30. Control of goal-directed and stimulus-driven attention in the brain. Nature reviews neuroscience, 3(3):201–215, 2002.
  31. Making working memory work: a computational model of learning in the prefrontal cortex and basal ganglia. Neural computation, 18(2):283–328, 2006.
  32. The attention system of the human brain: 20 years after. Annual review of neuroscience, 35:73, 2012.
  33. Neural mechanisms of selective visual attention. Annual review of psychology, 68:47–72, 2017.
  34. Orthogonal representations for robust context-dependent task performance in brains and neural networks. Neuron, 110(7):1258–1270, 2022.
  35. Curriculum learning for human compositional generalization. Proceedings of the National Academy of Sciences, 119(41):e2205582119, 2022.
  36. Bio-inspired visual attention and object recognition. In Intelligent Computing: Theory and Applications V, volume 6560, pages 17–27. SPIE, 2007.
  37. How biological attention mechanisms improve task performance in a large-scale visual system model. ELife, 7:e38105, 2018.
  38. Michael I Posner. Orienting of attention. Quarterly journal of experimental psychology, 32(1):3–25, 1980.
  39. Neural systems control of spatial orienting. Philosophical Transactions of the Royal Society of London. B, Biological Sciences, 298(1089):187–198, 1982.
  40. John Duncan. Selective attention and the organization of visual information. Journal of experimental psychology: General, 113(4):501, 1984.
  41. Shifting visual attention between objects and locations: evidence from normal and parietal lesion subjects. Journal of Experimental Psychology: General, 123(2):161, 1994a.
  42. Does visual attention select objects or locations? Journal of Experimental Psychology: General, 123(2):146, 1994.
  43. Object-based attentional selection—grouped arrays or spatially invariant representations?: Comment on vecera and farah (1994). 1997.
  44. Color segregation and selective attention in a nonsearch task. Perception & Psychophysics, 33(1):11–19, 1983.
  45. Movement and visual attention: the spotlight metaphor breaks down. Journal of Experimental Psychology: Human perception and performance, 15(3):448, 1989.
  46. Perceptual organization and focused attention: The role of objects and proximity in visual processing. Perception & psychophysics, 50(3):267–284, 1991.
  47. Visual parsing and response competition: The effect of grouping factors. Perception & Psychophysics, 51(2):145–162, 1992.
  48. Objects and attributes in divided attention: Surface and boundary systems. Perception & Psychophysics, 58(7):1076–1084, 1996.
  49. Computational modelling of visual attention. Nature reviews neuroscience, 2(3):194–203, 2001.
  50. Global effects of feature-based attention in human visual cortex. Nature neuroscience, 5(7):631–632, 2002.
  51. Zhe Chen. Object-based attention: A tutorial review. Attention, Perception, & Psychophysics, 74(5):784–802, 2012.
  52. The distinct modes of vision offered by feedforward and recurrent processing. Trends in neurosciences, 23(11):571–579, 2000.
  53. Visual attention: bottom-up versus top-down. Current biology, 14(19):R850–R852, 2004.
  54. Top-down versus bottom-up control of attention in the prefrontal and posterior parietal cortices. science, 315(5820):1860–1862, 2007.
  55. Automatic guidance of attention from working memory. Trends in cognitive sciences, 12(9):342–348, 2008.
  56. On the difference between working memory and attentional set. Neuropsychologia, 49(6):1553–1558, 2011.
  57. Kunihiko Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4):193–202, April 1980. doi: 10.1007/bf00344251. URL https://doi.org/10.1007/bf00344251.
  58. Kunihiko Fukushima. Neural network model for selective attention in visual pattern recognition and associative recall. Applied Optics, 26(23):4985–4992, 1987.
  59. Recognition and segmentation of connected characters with selective attention. Neural Networks, 6(1):33–41, January 1993. doi: 10.1016/s0893-6080(05)80071-1. URL https://doi.org/10.1016/s0893-6080(05)80071-1.
  60. SCAN: A scalable model of attentional selection. Neural Networks, 10(6):993–1015, August 1997. doi: 10.1016/s0893-6080(97)00034-8. URL https://doi.org/10.1016/s0893-6080(97)00034-8.
  61. Learning to generate artificial fovea trajectories for target detection. Int. J. Neural Syst., 2:125–134, 1991.
  62. Integration of bottom-up and top-down cues for visual attention using non-linear relaxation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pages 781–785, Los Alamitos, CA, USA, jun 1994. IEEE Computer Society. doi: 10.1109/CVPR.1994.323898. URL https://doi.ieeecomputersociety.org/10.1109/CVPR.1994.323898.
  63. A neural model combining attentional orienting to object recognition: Preliminary explorations on the interplay between where and what. In 2001 Conference Proceedings of the 23rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, volume 1, pages 789–792. IEEE, 2001.
  64. A selective attention-based method for visual pattern recognition with application to handwritten digit recognition and face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(3):420–425, 2002.
  65. Attentional selection for object recognition—a gentle way. In International workshop on biologically motivated computer vision, pages 472–479. Springer, 2002.
  66. Scene analysis with saccadic eye movements: top-down and bottom-up modeling. Journal of electronic imaging, 10(1):152–160, 2001.
  67. Neural machine translation by jointly learning to align and translate. September 2014a.
  68. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1412–1421, Lisbon, Portugal, September 2015. Association for Computational Linguistics. doi: 10.18653/v1/D15-1166. URL https://aclanthology.org/D15-1166.
  69. Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning, pages 2048–2057. PMLR, 2015.
  70. Long short-term memory. Neural Comput., 9(8):1735–1780, nov 1997a. ISSN 0899-7667. doi: 10.1162/neco.1997.9.8.1735. URL https://doi.org/10.1162/neco.1997.9.8.1735.
  71. On the properties of neural machine translation: Encoder–decoder approaches. In Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, pages 103–111, Doha, Qatar, October 2014a. Association for Computational Linguistics. doi: 10.3115/v1/W14-4012. URL https://aclanthology.org/W14-4012.
  72. Quasi-recurrent neural networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017. URL https://openreview.net/forum?id=H1zJ-v5xl.
  73. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1724–1734, Doha, Qatar, October 2014b. Association for Computational Linguistics. doi: 10.3115/v1/D14-1179. URL https://aclanthology.org/D14-1179.
  74. Recent trends in deep learning based natural language processing [review article]. IEEE Computational Intelligence Magazine, 13(3):55–75, 2018. doi: 10.1109/MCI.2018.2840738.
  75. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014b.
  76. Deep learning. MIT press, 2016.
  77. Neural turing machines. arXiv preprint arXiv:1410.5401, 2014.
  78. Long short-term memory-networks for machine reading. arXiv preprint arXiv:1601.06733, 2016.
  79. Long short-term memory. Neural computation, 9(8):1735–1780, 1997b.
  80. A decomposable attention model for natural language inference. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2249–2255, Austin, Texas, November 2016. Association for Computational Linguistics. doi: 10.18653/v1/D16-1244. URL https://aclanthology.org/D16-1244.
  81. BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805, 2018a. URL http://arxiv.org/abs/1810.04805.
  82. Convolutional sequence to sequence learning. In International conference on machine learning, pages 1243–1252. PMLR, 2017.
  83. Self-attention with relative position representations. arXiv preprint arXiv:1803.02155, 2018.
  84. Stand-alone self-attention in vision models. Advances in Neural Information Processing Systems, 32, 2019.
  85. Attention augmented convolutional networks. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3286–3295, 2019.
  86. Rethinking and improving relative position encoding for vision transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10033–10041, 2021a.
  87. Position information in transformers: An overview. Computational Linguistics, pages 1–31, 2021.
  88. Adaptive input representations for neural language modeling. arXiv preprint arXiv:1809.10853, 2018.
  89. Learning deep transformer models for machine translation. arXiv preprint arXiv:1906.01787, 2019.
  90. Texture synthesis by non-parametric sampling. In Proceedings of the seventh IEEE international conference on computer vision, volume 2, pages 1033–1038. IEEE, 1999.
  91. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
  92. Bilateral filtering for gray and color images. In Sixth international conference on computer vision (IEEE Cat. No. 98CH36271), pages 839–846. IEEE, 1998.
  93. Image transformer. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 4055–4064. PMLR, 10–15 Jul 2018a. URL https://proceedings.mlr.press/v80/parmar18a.html.
  94. Scaling local self-attention for parameter efficient visual backbones. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12894–12904, 2021.
  95. Bottleneck transformers for visual recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16519–16529, 2021.
  96. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  97. End-to-end object detection with transformers. In European conference on computer vision, pages 213–229. Springer, 2020.
  98. A^ 2-nets: Double attention networks. Advances in neural information processing systems, 31, 2018.
  99. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7132–7141, 2018.
  100. Compact generalized non-local network. Advances in neural information processing systems, 31, 2018.
  101. Efficient attention: Attention with linear complexities. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 3531–3539, January 2021.
  102. Ccnet: Criss-cross attention for semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 603–612, 2019.
  103. Relating transformers to models and neural representations of the hippocampal formation. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=B8DVo9B1YE0.
  104. The tolman-eichenbaum machine: unifying space and relational memory through generalization in the hippocampal formation. Cell, 183(5):1249–1263, 2020.
  105. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018b.
  106. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. IEEE, 2009.
  107. Training data-efficient image transformers & distillation through attention. In International Conference on Machine Learning, pages 10347–10357. PMLR, 2021a.
  108. Cvt: Introducing convolutions to vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 22–31, 2021b.
  109. Cmt: Convolutional neural networks meet vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12175–12185, 2022.
  110. Incorporating convolution designs into visual transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 579–588, 2021.
  111. Levit: a vision transformer in convnet’s clothing for faster inference. In Proceedings of the IEEE/CVF international conference on computer vision, pages 12259–12269, 2021.
  112. Coatnet: Marrying convolution and attention for all data sizes. Advances in Neural Information Processing Systems, 34:3965–3977, 2021.
  113. Conformer: Local features coupling global representations for visual recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 367–376, October 2021.
  114. Localvit: Bringing locality to vision transformers. arXiv preprint arXiv:2104.05707, 2021.
  115. Elsa: Enhanced local self-attention for vision transformer. arXiv preprint arXiv:2112.12786, 2021.
  116. Convit: Improving vision transformers with soft convolutional inductive biases. In International Conference on Machine Learning, pages 2286–2296. PMLR, 2021.
  117. Conviformers: Convolutionally guided vision transformer. arXiv preprint arXiv:2208.08900, 2022a.
  118. Scaling vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12104–12113, 2022.
  119. Image transformer. arXiv preprint arXiv:1802.05751, 2018b.
  120. Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509, 2019.
  121. Energy and policy considerations for modern deep learning research. Proceedings of the AAAI Conference on Artificial Intelligence, 34(09):13693–13696, April 2020. doi: 10.1609/aaai.v34i09.7123. URL https://doi.org/10.1609/aaai.v34i09.7123.
  122. Drawing early-bird tickets: Toward more efficient training of deep networks. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=BJxsrgStvr.
  123. Emergent symbols through binding in external memory. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=LSFCEb3GYU7.
  124. What do we perceive in a glance of a real-world scene? J. Vis., 7(1):1–29, 2007.
  125. Visual turing test for computer vision systems. Proc. Natl. Acad. Sci. U. S. A., 112(12):3618–3623, 2015.
  126. Beyond the feedforward sweep: feedback computations in the visual cortex. Ann. N. Y. Acad. Sci., February 2020.
  127. Gordon D Logan. On the ability to inhibit thought and action: A users’ guide to the stop signal paradigm. Academic Press, 1994a.
  128. Mental rotation of three-dimensional objects. Science, 171(3972):701–703, 1971.
  129. Gordon D Logan. Spatial attention and the apprehension of spatial relations. Journal of Experimental Psychology: Human Perception and Performance, 20(5):1015, 1994b.
  130. Visual attention and the apprehension of spatial relations: The case of depth. J. Exp. Psychol. Hum. Percept. Perform., 20(5):1015–1036, 1994. ISSN 0096-1523.
  131. Attentional coding of categorical relations in scene perception: evidence from the flicker paradigm. Psychon. Bull. Rev., 9(2):319–26, 2002. ISSN 1069-9384.
  132. Perceiving spatial relations via attentional tracking and shifting. Curr. Biol., 21(13):1135–1139, 2011. ISSN 09609822.
  133. Retinotopic mapping of categorical and coordinate spatial relation processing in early visual cortex. PLoS One, 7(6):1–8, 2012. ISSN 19326203.
  134. Recruitment of anterior dorsolateral prefrontal cortex in human reasoning: a parametric study of relational complexity. Cerebral cortex, 12(5):477–485, 2002.
  135. Differential role of anterior prefrontal and premotor cortex in the processing of relational information. Neuroimage, 49(3):2890–2900, 2010. ISSN 10538119.
  136. Working memory for relations among objects. Attention, Perception, & Psychophysics, 76(7):1933–1953, 2014.
  137. Contextual effects in visual working memory reveal hierarchically structured memory representations. Journal of Vision, 15(15):6, November 2015. doi: 10.1167/15.15.6. URL https://doi.org/10.1167/15.15.6.
  138. Same-different conceptualization: a machine vision perspective. Current Opinion in Behavioral Sciences, 37:47 – 55, 2021. ISSN 2352-1546. doi: https://doi.org/10.1016/j.cobeha.2020.08.008.
  139. Unsupervised learning by program synthesis. In NIPS, 2015a.
  140. Not-so-clevr: learning same–different relations strains feedforward neural networks. Interface focus, 8(4):20180011, 2018.
  141. Solving the same-different task with convolutional neural networks. Pattern Recognition Letters, 2021a. ISSN 0167-8655. doi: https://doi.org/10.1016/j.patrec.2020.12.019.
  142. Evaluating the progress of deep learning for visual relational concepts. Journal of Vision, 21(11):8–8, 10 2021. ISSN 1534-7362. doi: 10.1167/jov.21.11.8. URL https://doi.org/10.1167/jov.21.11.8.
  143. 25 years of cnns: Can we compare to human abstraction capabilities? In Alessandro E.P. Villa, Paolo Masulli, and Antonio Javier Pons Rivero, editors, Artificial Neural Networks and Machine Learning – ICANN 2016, pages 380–387, Cham, 2016a. Springer International Publishing. ISBN 978-3-319-44781-0.
  144. Can deep convolutional neural networks learn same-different relations? bioRxiv, 2021. doi: 10.1101/2021.04.06.438551. URL https://www.biorxiv.org/content/early/2021/05/12/2021.04.06.438551.
  145. Five points to check when comparing visual perception in humans and machines. Journal of Vision, 21(3):16, March 2021a. doi: 10.1167/jov.21.3.16. URL https://doi.org/10.1167/jov.21.3.16.
  146. Program synthesis performance constrained by non-linear spatial relations in synthetic visual reasoning test. arXiv preprint arXiv:1911.07721, 2019a.
  147. Differential involvement of eeg oscillatory components in sameness versus spatial-relation visual reasoning tasks. eNeuro, 8(1), 2021a. doi: 10.1523/ENEURO.0267-20.2020. URL https://www.eneuro.org/content/8/1/ENEURO.0267-20.2020.
  148. Chaz Firestone. Performance vs. competence in human–machine comparisons. Proceedings of the National Academy of Sciences, 117(43):26562–26571, 2020.
  149. Same/different in visual reasoning. Current Opinion in Behavioral Sciences, 37:63–68, 2021. ISSN 2352-1546. doi: https://doi.org/10.1016/j.cobeha.2020.09.008. URL https://www.sciencedirect.com/science/article/pii/S2352154620301431. Same-different conceptualization.
  150. Learning same and different relations: cross-species comparisons. Current Opinion in Behavioral Sciences, 37:84–89, 2021a. ISSN 2352-1546. doi: https://doi.org/10.1016/j.cobeha.2020.11.013. URL https://www.sciencedirect.com/science/article/pii/S2352154620301728. Same-different conceptualization.
  151. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  152. Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11):665–673, 2020.
  153. Mlp-mixer: An all-mlp architecture for vision. arXiv preprint arXiv:2105.01601, 2021.
  154. Recurrent vision transformer for solving visual reasoning problems, 2021b.
  155. Resnet with one-neuron hidden layers is a universal approximator. arXiv preprint arXiv:1806.10909, 2018.
  156. Gary F. Marcus. The Algebraic Mind: Integrating Connectionism and Cognitive Science. The MIT Press, 04 2001. ISBN 9780262279086. doi: 10.7551/mitpress/1187.001.0001. URL https://doi.org/10.7551/mitpress/1187.001.0001.
  157. Learning what and where to attend. arXiv preprint arXiv:1805.08819, 2018a.
  158. Recurrent neural circuits for contour detection. arXiv preprint arXiv:2010.15314, 2020.
  159. Do neural networks for segmentation understand insideness? Neural Computation, 33(9):2511–2549, 2021.
  160. D Alanallport. Parallel encoding within and between elementary stimulus dimensions. Perception & psychophysics, 10(2):104–108, 1971.
  161. Selective attention and cognitive control. Trends in neurosciences, 1987.
  162. Selective attention gates visual processing in the extrastriate cortex. Science, 229(4715):782–784, 1985.
  163. Attentional modulation of visual motion processing in cortical areas mt and mst. Nature, 382(6591):539–541, 1996.
  164. Object-based attention in the primary visual cortex of the macaque monkey. Nature, 395(6700):376–381, 1998.
  165. fmri evidence for objects as the units of attentional selection. Nature, 401(6753):584–587, 1999.
  166. Mechanisms of visual attention in the human cortex. Annual review of neuroscience, 23(1):315–341, 2000.
  167. Marisa Carrasco. Visual attention: The past 25 years. Vision research, 51(13):1484–1525, 2011.
  168. Attention improves performance primarily by reducing interneuronal correlations. Nature neuroscience, 12(12):1594–1600, 2009.
  169. Spatial attention decorrelates intrinsic activity fluctuations in macaque area v4. Neuron, 63(6):879–888, 2009.
  170. Dynamic shifts of visual receptive fields in cortical area mt by spatial attention. Nature neuroscience, 9(9):1156–1160, 2006.
  171. Time course of attention reveals different mechanisms for spatial and feature-based attention in area v4. Neuron, 47(5):637–643, 2005.
  172. Neuronal effects of spatial and feature attention differ due to normalization. Journal of Neuroscience, 39(28):5493–5505, 2019.
  173. Attentional modulation of visual processing. Annu. Rev. Neurosci., 27:611–647, 2004.
  174. Anne Treisman. Feature binding, attention and object perception. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 353(1373):1295–1306, 1998.
  175. Keiji Tanaka. Inferotemporal cortex and object vision. Annual review of neuroscience, 19(1):109–139, 1996.
  176. Are cortical models really bound by the “binding problem”? Neuron, 24(1):87–93, 1999.
  177. Serial deployment of attention during visual search. Journal of Experimental Psychology: Human Perception and Performance, 29(1):121, 2003.
  178. Bernhard Hommel. Event files: Evidence for automatic integration of stimulus-response episodes. Visual cognition, 5(1-2):183–216, 1998.
  179. The theta-gamma neural code. Neuron, 77(6):1002–1016, 2013.
  180. Tom Verguts. Binding by random bursts: A computational model of cognitive control. Journal of cognitive neuroscience, 29(6):1103–1118, 2017.
  181. Time-based binding as a solution to and a limitation for flexible cognition. Frontiers in Psychology, 12:6396, 2022.
  182. Covert orienting in the split brain reveals hemispheric specialization for object-based attention. Psychological science, 5(6):380–383, 1994b.
  183. Action recognition using visual attention. arXiv preprint arXiv:1511.04119, 2015.
  184. ABC-CNN: an attention based convolutional neural network for visual question answering. CoRR, abs / 1511.05960, 2015. URL http://arxiv.org/abs/1511.05960.
  185. Stacked attention networks for image question answering. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 21–29, 2016.
  186. Ask, attend and answer: Exploring question-guided spatial attention for visual question answering. CoRR, abs / 1511.05234, 2015. URL http://arxiv.org/abs/1511.05234.
  187. End-to-end instance segmentation and counting with recurrent attention. CoRR, abs / 1605.09410, 2016. URL http://arxiv.org/abs/1605.09410.
  188. Deep networks with internal selective attention through feedback connections. In Advances in neural information processing systems, pages 3545–3553, 2014.
  189. Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5659–5667, 2017.
  190. Global-and-local attention networks for visual recognition. arXiv preprint arXiv:1805.08819, 2018b.
  191. Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), pages 3–19, 2018.
  192. Deformable DETR: Deformable transformers for End-to-End object detection. October 2020.
  193. Attention over learned object embeddings enables complex visual reasoning. volume 34, pages 9112–9124, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (1)
  1. Mohit Vaishnav (6 papers)

Summary

We haven't generated a summary for this paper yet.