Covariance properties under natural image transformations for the generalized Gaussian derivative model for visual receptive fields (2303.09803v4)
Abstract: This paper presents a theory for how geometric image transformations can be handled by a first layer of linear receptive fields, in terms of true covariance properties, which, in turn, enable geometric invariance properties at higher levels in the visual hierarchy. Specifically, we develop this theory for a generalized Gaussian derivative model for visual receptive fields, which is derived in an axiomatic manner from first principles, that reflect symmetry properties of the environment, complemented by structural assumptions to guarantee internally consistent treatment of image structures over multiple spatio-temporal scales. It is shown how the studied generalized Gaussian derivative model for visual receptive fields obeys true covariance properties under spatial scaling transformations, spatial affine transformations, Galilean transformations and temporal scaling transformations, implying that a vision system, based on image and video measurements in terms of the receptive fields according to this model, can to first order of approximation handle the image and video deformations between multiple views of objects delimited by smooth surfaces, as well as between multiple views of spatio-temporal events, under varying relative motions between the objects and events in the world and the observer. We conclude by describing implications of the presented theory for biological vision, regarding connections between the variabilities of the shapes of biological visual receptive fields and the variabilities of spatial and spatio-temporal image structures under natural image transformations.
- Lessons from deep neural networks for studying the coding principles of biological neural networks. Frontiers in Systems Neuroscience 14, 615129
- Riesz networks: Scale invariant neural networks in a single forward pass. arXiv preprint arXiv:2305.04665
- Bekkers, E. J. (2020). B-spline CNNs on Lie groups. International Conference on Learning Representations (ICLR 2020)
- Size invariance in visual object priming. Journal of Experimental Physiology: Human Perception and Performance 18, 121–133
- Blasdel, G. G. (1992). Orientation selectivity, preference and continuity in monkey striate cortex. Journal of Neuroscience 12, 3139–3161
- Iso-orientation domains in cat visual cortex are arranged in pinwheel-like patterns. Nature 353, 429–431
- Deep problems with neural network models of human vision. Behavioral and Brain Sciences , 1–74
- Spatial and temporal properties of cone signals in alert macaque primary visual cortex. Journal of Neuroscience 26, 10826–10846
- Spatial receptive field structure of double-opponent cells in macaque V1. Journal of Neurophysiology 125, 843–857
- A modern view of the classical receptive field: Linear and non-linear spatio-temporal processing by V1 neurons. In The Visual Neurosciences, eds. L. M. Chalupa and J. S. Werner (MIT Press), vol. 1. 704–719
- Receptive field dynamics in the central visual pathways. Trends in Neuroscience 18, 451–457
- How does the brain solve visual object recognition? Neuron 73, 415–434
- Perceptual learning in object recognition: Object specificity and size invariance. Vision Research 40, 473–484
- Geisler, W. S. (2008). Visual perception and the statistical properties of natural scenes. Annual Review of Psychology 59, 10.1–10.26
- From filters to features: Scale-space analysis of edge and blur coding in human vision. Journal of Vision 7, 7.1–21
- Towards building a more complex view of the lateral geniculate nucleus: Recent advances in understanding its role. Progress in Neurobiology 156, 214–255
- A recurrent model of contour integration in primary visual cortex. Journal of Vision 8, 8.1–25
- Hartline, H. K. (1938). The response of single optic nerve fibers of the vertebrate eye to illumination of the retina. American Journal of Physiology 121, 400–415
- What do deep neural networks tell us about biological vision? Vision Research 198, 108069
- Edges and bars: where do people see features in 1-D images? Vision Research 45, 507–525
- Receptive fields of single neurones in the cat’s striate cortex. J Physiol 147, 226–238
- Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J Physiol 160, 106–154
- Receptive fields and functional architecture of monkey striate cortex. The Journal of Physiology 195, 215–243
- Brain and Visual Perception: The Story of a 25-Year Collaboration (Oxford University Press)
- Fast readout of object indentity from macaque inferior temporal cortex. Science 310, 863–866
- Natural Image Statistics: A Probabilistic Approach to Early Computational Vision. Computational Imaging and Vision (Springer)
- The dynamics of invariant object recognition in the human visual system. Journal of Neurophysiology 111, 91–102
- Size and position invariance of neuronal responses in monkey inferotemporal cortex. Journal of Neurophysiology 73, 218–226
- Structured receptive fields in CNNs. In Proc. Computer Vision and Pattern Recognition (CVPR 2016). 2610–2619
- Scale-invariant scale-channel networks: Deep networks that generalise to previously unseen scales. Journal of Mathematical Imaging and Vision 64, 506–536
- The orientation selectivity of color-responsive neurons in Macaque V1. The Journal of Neuroscience 28, 8096–8106
- An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. J. of Neurophysiology 58, 1233–1258
- The two-dimensional spatial structure of simple receptive fields in cat striate cortex. J. of Neurophysiology 58, 1187–1211
- Estimating and interpreting nonlinear receptive field of sensory neural responses with deep neural network models. eLife 9, e53445
- Functional implications of orientation maps in primary visual cortex. Nature Communications 7, 13529
- Koenderink, J. J. (1984). The structure of images. Biological Cybernetics 50, 363–370
- Representation of local geometry in the visual system. Biological Cybernetics 55, 367–375
- Generic neighborhood operators. IEEE Trans. Pattern Analysis and Machine Intell. 14, 597–605
- Lindeberg, T. (1998). Feature detection with automatic scale selection. International Journal of Computer Vision 30, 77–116
- Lindeberg, T. (2011). Generalized Gaussian scale-space axiomatics comprising linear scale-space, affine scale-space and spatio-temporal scale-space. Journal of Mathematical Imaging and Vision 40, 36–81
- Lindeberg, T. (2013). A computational theory of visual receptive fields. Biological Cybernetics 107, 589–635
- Lindeberg, T. (2016). Time-causal and time-recursive spatio-temporal receptive fields. Journal of Mathematical Imaging and Vision 55, 50–88
- Lindeberg, T. (2020). Provably scale-covariant continuous hierarchical networks based on scale-normalized differential expressions coupled in cascade. Journal of Mathematical Imaging and Vision 62, 120–148
- Lindeberg, T. (2021). Normative theory of visual receptive fields. Heliyon 7, e05897:1–20. 10.1016/j.heliyon.2021.e05897
- Lindeberg, T. (2022). Scale-covariant and scale-invariant Gaussian derivative networks. Journal of Mathematical Imaging and Vision 64, 223–242
- Lindeberg, T. (2023a). Orientation selectivity of affine Gaussian derivative based receptive fields. arXiv preprint arXiv:2304.11920
- Lindeberg, T. (2023b). A time-causal and time-recursive scale-covariant scale-space representation of temporal signals and past time. Biological Cybernetics 117, 21–59
- Scale-space with causal time direction. In Proc. European Conf. on Computer Vision (ECCV’96) (Cambridge, UK), vol. 1064 of Springer LNCS, 229–240
- Shape-adapted smoothing in estimation of 3-D shape cues from affine distortions of local 2-D structure. Image and Vision Computing 15, 415–434
- Shape representation in the inferior temporal cortex of monkeys. Current Biology 5, 552–563
- Efficient sparse coding in early sensory processing: Lessons from signal recovery. PLOS Computational Biology 8, e1002372
- Lowe, D. G. (2000). Towards a computational model for object recognition in IT cortex. In Biologically Motivated Computer Vision (Springer), vol. 1811 of Springer LNCS, 20–31
- Mallat, S. (2016). Understanding deep convolutional networks. Phil. Trans. Royal Society A 374, 20150203
- Marcelja, S. (1980). Mathematical description of the responses of simple cortical cells. Journal of Optical Society of America 70, 1297–1300
- Blurred edges look faint, and faint edges look sharp: The effect of a gradient threshold in a multi-scale edge coding model. Vision Research 47, 1705–1720
- Neuronal selectivity and local map structure in visual cortex. Neuron 57, 673–679
- Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Journal of Optical Society of America 381, 607–609
- Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Research 37, 3311–3325
- A cascade model of information processing and encoding for retinal prosthesis. Neural Regeneration Research 11, 646
- Fully trainable Gaussian derivative convolutional layer. In International Conference on Image Processing (ICIP 2022). 2421–2425
- Resolution learning in deep convolutional networks using scale-space theory. IEEE Trans. Image Processing 30, 8342–8353
- Visual Cortex and Deep Networks: Learning Invariant Representations (MIT Press)
- The generalized Gabor scheme of image representation in biological and machine vision. IEEE Trans. Pattern Analysis and Machine Intell. 10, 452–468
- Development of localized oriented receptive fields by learning a translation-invariant code for natural images. Computation in Neural Systems 9, 219–234
- Hierarchical models of object recognition in cortex. Nature 2, 1019–1025
- Ringach, D. L. (2002). Spatial structure and symmetry of simple-cell receptive fields in macaque primary visual cortex. Journal of Neurophysiology 88, 455–463
- Ringach, D. L. (2004). Mapping receptive fields in primary visual cortex. Journal of Physiology 558, 717–728
- Rodieck, R. W. (1965). Quantitative analysis of cat retinal ganglion cell response to visual stimuli. Vision Research 5, 583–601
- Scale equivariant U-net. In Proc. British Machine Vision Conference (BMVC 2022)
- Natural image statistics and neural representations. Annual Review of Neuroscience 24, 1193–1216
- Sensory cortex is optimized for prediction of future input. Elife 7, e31557
- DISCO: Accurate discrete scale convolutions. British Machine Vision Conference (BMVC 2021)
- How to transform kernels for scale-convolutions. In Proc. International Conference on Computer Vision Workshops (ICCVW 2021). 1092–1097
- Scale-equivariant steerable networks. International Conference on Learning Representations (ICLR 2020)
- Mach edges: Local features predicted by 3rd derivative spatial filtering. Vision Research 49, 1886–1893
- Contour detection in colour images using a neurophysiologically inspired model. Cognitive Computation 8, 1027–1035
- Are deep neural networks adequate behavioral models of human visual perception? Annual Review of Vision Science 9
- Deep scale-spaces: Equivariance over scale. In Advances in Neural Information Processing Systems (NeurIPS 2019). 7366–7378
- Scale-equivariant UNet for histopathology image segmentation. arXiv preprint arXiv:2304.04595
- Young, R. A. (1987). The Gaussian derivative model for spatial vision: I. Retinal mechanisms. Spatial Vision 2, 273–293
- The Gaussian derivative model for spatio-temporal vision: II. Cortical data. Spatial Vision 14, 321–389
- The Gaussian derivative model for spatio-temporal vision: I. Cortical model. Spatial Vision 14, 261–319
- Scale-translation-equivariant neural networks with decomposed convolutional filters. Journal of Machine Learning Research 23, 1–45