Synaptic Weight Distributions Depend on the Geometry of Plasticity (2305.19394v2)
Abstract: A growing literature in computational neuroscience leverages gradient descent and learning algorithms that approximate it to study synaptic plasticity in the brain. However, the vast majority of this work ignores a critical underlying assumption: the choice of distance for synaptic changes - i.e. the geometry of synaptic plasticity. Gradient descent assumes that the distance is Euclidean, but many other distances are possible, and there is no reason that biology necessarily uses Euclidean geometry. Here, using the theoretical tools provided by mirror descent, we show that the distribution of synaptic weights will depend on the geometry of synaptic plasticity. We use these results to show that experimentally-observed log-normal weight distributions found in several brain areas are not consistent with standard gradient descent (i.e. a Euclidean geometry), but rather with non-Euclidean distances. Finally, we show that it should be possible to experimentally test for different synaptic geometries by comparing synaptic weight distributions before and after learning. Overall, our work shows that the current paradigm in theoretical work on synaptic plasticity that assumes Euclidean synaptic geometry may be misguided and that it should be possible to experimentally determine the true geometry of synaptic plasticity in the brain.
- Deep learning without weight transport. Advances in neural information processing systems, 32, 2019.
- Shun-ichi Amari. Differential-geometrical methods in statistics. Lecture Notes on Statistics, 1985.
- Shun-Ichi Amari. Natural gradient works efficiently in learning. Neural computation, 1998.
- Stochastic gradient/mirror descent: Minimax optimality and implicit regularization. arXiv preprint arXiv:1806.00952, 2018.
- Stochastic mirror descent on overparameterized nonlinear models. IEEE Transactions on Neural Networks and Learning Systems, 2021.
- The functional specialization of visual cortex emerges from training parallel pathways with self-supervised predictive learning. Advances in Neural Information Processing Systems, 34:25164–25178, 2021.
- Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31(3):167–175, 2003.
- Patrick Billingsley. Convergence of probability measures. John Wiley & Sons, 1999.
- Central limit theorems for interchangeable processes. Canadian Journal of Mathematics, 10:222–229, 1958.
- Self-consistent dynamical field theory of kernel evolution in wide neural networks. arXiv preprint arXiv:2205.09653, 2022.
- Lev M Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR computational mathematics and mathematical physics, 7(3):200–217, 1967.
- Sébastien Bubeck et al. Convex optimization: Algorithms and complexity. Foundations and Trends® in Machine Learning, 8(3-4):231–357, 2015.
- The log-dynamic brain: how skewed distributions affect network operations. Nature Reviews Neuroscience, 15(4):264–278, 2014.
- On the choice of metric in gradient-based theories of brain function. arXiv e-prints, pp. arXiv–1805, 2018.
- Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 15750–15758, 2021.
- On lazy training in differentiable programming. Advances in neural information processing systems, 32, 2019.
- Credit assignment through broadcasting a global error vector. Advances in Neural Information Processing Systems, 34:10053–10066, 2021.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Ieee, 2009.
- Binary and analog variation of synapses between cortical pyramidal neurons. Elife, 11:e76120, 2022.
- Composite objective mirror descent. In COLT, volume 10, pp. 14–26. Citeseer, 2010.
- Orthogonal representations for robust context-dependent task performance in brains and neural networks. Neuron, 110(7):1258–1270, 2022.
- Neuromodulated spike-timing-dependent plasticity, and theory of three-factor learning rules. Frontiers in neural circuits, 9:85, 2016.
- Exponentiated gradient meets gradient descent. In Algorithmic learning theory, pp. 386–407. PMLR, 2020.
- Implicit regularization in matrix factorization. Advances in Neural Information Processing Systems, 30, 2017.
- Characterizing implicit bias in terms of optimization geometry. In International Conference on Machine Learning, pp. 1832–1841. PMLR, 2018.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- Neural tangent kernel: Convergence and generalization in neural networks. Advances in neural information processing systems, 31, 2018.
- Exponentiated gradient versus gradient descent for linear predictors. information and computation, 132(1):1–63, 1997.
- Brain-Like Object Recognition with High-Performing Shallow Recurrent ANNs. In Neural Information Processing Systems (NeurIPS). Curran Associates, Inc., 2019.
- Learning with three factors: modulating hebbian plasticity with errors. Current opinion in neurobiology, 46:170–177, 2017.
- Mirror descent and the information ratio. In Conference on Learning Theory, pp. 2965–2992. PMLR, 2021.
- Bandit algorithms. Cambridge University Press, 2020.
- FFCV: Accelerating training by removing data bottlenecks. https://github.com/libffcv/ffcv/, 2022. commit 720cc2e.
- Mnist handwritten digit database, 2010.
- Wide neural networks of any depth evolve as linear models under gradient descent. Advances in neural information processing systems, 32, 2019.
- Spatial profile of excitatory and inhibitory synaptic connectivity in mouse primary auditory cortex. Journal of Neuroscience, 32(16):5609–5619, 2012.
- How important is weight symmetry in backpropagation? In Proceedings of the AAAI Conference on Artificial Intelligence, volume 30, 2016.
- Random synaptic feedback weights support error backpropagation for deep learning. Nature communications, 7(1):13276, 2016.
- Multiplicative dynamics underlie the emergence of the log-normal distribution of spine sizes in the neocortex in vivo. Journal of Neuroscience, 31(26):9481–9488, 2011.
- Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European conference on computer vision (ECCV), pp. 116–131, 2018.
- James Martens. New insights and perspectives on the natural gradient method. The Journal of Machine Learning Research, 21(1):5776–5851, 2020.
- Gaussian process behaviour in wide deep neural networks. arXiv preprint arXiv:1804.11271, 2018.
- Distinct in vivo dynamics of excitatory synapses onto cortical pyramidal neurons and parvalbumin-positive interneurons. Cell reports, 37(6):109972, 2021.
- Making look-ahead active learning strategies feasible with neural tangent kernels. arXiv preprint arXiv:2206.12569, 2022.
- Task-driven convolutional recurrent models of the visual system. Advances in neural information processing systems, 31, 2018.
- Problem complexity and method efficiency in optimization. Wiley-Interscience, 1983.
- Information-geometric optimization algorithms: A unifying picture via invariance principles. The Journal of Machine Learning Research, 2017.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- Biological credit assignment through dynamic inversion of feedforward networks. Advances in Neural Information Processing Systems, 33:10065–10076, 2020.
- Kernelized information bottleneck leads to biologically plausible 3-factor hebbian learning in deep networks. Advances in Neural Information Processing Systems, 33:7296–7307, 2020.
- How to prepare your task head for finetuning. arXiv preprint arXiv:2302.05779, 2023.
- A deep learning framework for neuroscience. Nature neuroscience, 22(11):1761–1770, 2019.
- The study of plasticity has always been about gradients. The Journal of Physiology, 2023.
- Daniel L Ruderman. The statistics of natural images. Network: computation in neural systems, 5(4):517, 1994.
- Brain-score: Which artificial neural network for object recognition is most brain-like? BioRxiv, pp. 407007, 2018.
- Five key factors determining pairwise correlations in visual cortex. Journal of neurophysiology, 114(2):1022–1033, 2015.
- Powerpropagation: A sparsity inducing weight reparameterisation. Advances in Neural Information Processing Systems, 34:28889–28903, 2021.
- Shai Shalev-Shwartz et al. Online learning and online convex optimization. Foundations and Trends® in Machine Learning, 4(2):107–194, 2012.
- Highly nonrandom features of synaptic connectivity in local cortical circuits. PLoS biology, 3(3):e68, 2005.
- Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, pp. 6105–6114. PMLR, 2019.
- Filopodia are a structural substrate for silent synapses in adult neocortex. Nature, pp. 1–5, 2022.
- Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.
- Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences, 111(23):8619–8624, 2014.
- Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3):107–115, 2021.
- A theory of learning with constrained weight-distribution. arXiv preprint arXiv:2206.08933, 2022.