Wilsonian Renormalization of Neural Network Gaussian Processes (2405.06008v2)
Abstract: Separating relevant and irrelevant information is key to any modeling process or scientific inquiry. Theoretical physics offers a powerful tool for achieving this in the form of the renormalization group (RG). Here we demonstrate a practical approach to performing Wilsonian RG in the context of Gaussian Process (GP) Regression. We systematically integrate out the unlearnable modes of the GP kernel, thereby obtaining an RG flow of the GP in which the data sets the IR scale. In simple cases, this results in a universal flow of the ridge parameter, which becomes input-dependent in the richer scenario in which non-Gaussianities are included. In addition to being analytically tractable, this approach goes beyond structural analogies between RG and neural networks by providing a natural connection between RG flow and learnable vs. unlearnable modes. Studying such flows may improve our understanding of feature learning in deep neural networks, and enable us to identify potential universality classes in these models.
- M. Peskin and D. Schroeder, An Introduction To Quantum Field Theory. Frontiers in Physics. Avalon Publishing, 1995. https://books.google.co.il/books?id=EVeNNcslvX0C.
- J. Cardy, Scaling and Renormalization in Statistical Physics. Cambridge lecture notes in physics. Cambridge University Press, 1996. https://books.google.co.il/books?id=g5hfPgAACAAJ.
- K. G. Wilson and J. Kogut, “The renormalization group and the ϵitalic-ϵ\epsilonitalic_ϵ expansion,” Physics Reports 12 no. 2, (1974) 75–199.
- D. J. Gross and F. Wilczek, “Ultraviolet behavior of non-abelian gauge theories,” Phys. Rev. Lett. 30 (Jun, 1973) 1343–1346.
- J. M. Kosterlitz and D. J. Thouless, “Ordering, metastability and phase transitions in two-dimensional systems,” Journal of Physics C Solid State Physics 6 no. 7, (Apr., 1973) 1181–1203.
- J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei, “Scaling laws for neural language models,” arXiv:2001.08361 [cs.LG].
- P. Molchanov, S. Tyree, T. Karras, T. Aila, and J. Kautz, “Pruning convolutional neural networks for resource efficient inference,” arXiv:1611.06440 [cs.LG].
- Y. Bahri, E. Dyer, J. Kaplan, J. Lee, and U. Sharma, “Explaining neural scaling laws,” arXiv:2102.06701 [cs.LG].
- B. Bordelon, A. Canatar, and C. Pehlevan, “Spectrum dependent learning curves in kernel regression and wide neural networks,” arXiv:2002.02561 [cs.LG].
- R. Novak, L. Xiao, Y. Bahri, J. Lee, G. Yang, J. Hron, D. A. Abolafia, J. Pennington, and J. N. Sohl-Dickstein, “Bayesian deep convolutional networks with many channels are gaussian processes,” in International Conference on Learning Representations. 2018. https://api.semanticscholar.org/CorpusID:57721101.
- J. Lee, S. S. Schoenholz, J. Pennington, B. Adlam, L. Xiao, R. Novak, and J. N. Sohl-Dickstein, “Finite versus infinite neural networks: an empirical study,” ArXiv abs/2007.15801 (2020) .
- R. M. Neal, “Bayesian learning for neural networks,” PhD thesis, University of Toronto (1995) .
- A. Jacot, F. Gabriel, and C. Hongler, “Neural tangent kernel: Convergence and generalization in neural networks,” Advances in neural information processing systems 31 (2018) .
- J. Lee, J. Sohl-dickstein, J. Pennington, R. Novak, S. Schoenholz, and Y. Bahri, “Deep neural networks as gaussian processes,” in International Conference on Learning Representations. 2018. https://openreview.net/forum?id=B1EA-M-0Z.
- A. G. d. G. Matthews, M. Rowland, J. Hron, R. E. Turner, and Z. Ghahramani, “Gaussian process behaviour in wide deep neural networks,” arXiv preprint arXiv:1804.11271 (2018) .
- O. Cohen, O. Malka, and Z. Ringel, “Learning curves for overparametrized deep neural networks: A field theory perspective,” Physical Review Research 3 no. 2, (Apr., 2021) .
- Q. Li and H. Sompolinsky, “Statistical mechanics of deep linear neural networks: The backpropagating kernel renormalization,” Phys. Rev. X 11 (Sep, 2021) 031059.
- I. Seroussi, G. Naveh, and Z. Ringel, “Separation of scales and a thermodynamic description of feature learning in some cnns,” Nature Communications 14 no. 1, (Feb, 2023) 908.
- S. Ariosto, R. Pacelli, M. Pastore, F. Ginelli, M. Gherardi, and P. Rotondo, “Statistical mechanics of deep learning beyond the infinite-width limit,” arXiv preprint arXiv:2209.04882 (2022) .
- C. E. Rasmussen and C. K. I. Williams, Gaussian processes for machine learning. Adaptive computation and machine learning. MIT Press, 2006.
- A. Canatar, B. Bordelon, and C. Pehlevan, “Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks,” Nature communications 12 no. 1, (2021) 2914.
- J. Halverson, A. Maiti, and K. Stoner, “Neural Networks and Quantum Field Theory,” Mach. Learn. Sci. Tech. 2 no. 3, (2021) 035002, arXiv:2008.08601 [cs.LG].
- H. Erbin, V. Lahoche, and D. Ousmane Samary, “Non-perturbative renormalization for the neural network-qft correspondence,” Machine Learning: Science and Technology 3 no. 1, (Feb., 2022) 015027.
- H. Erbin, V. Lahoche, and D. O. Samary, “Renormalization in the neural network-quantum field theory correspondence,” arXiv:2212.11811 [hep-th].
- H. Erbin, R. Finotello, B. W. Kpera, V. Lahoche, and D. O. Samary, “Functional renormalization group for signal detection and stochastic ergodicity breaking,” arXiv:2310.07499 [hep-th].
- K. T. Grosvenor and R. Jefferson, “The edge of chaos: quantum field theory and deep neural networks,” SciPost Phys. 12 no. 3, (2022) 081, arXiv:2109.13247 [hep-th].
- Cambridge University Press, May, 2022. http://dx.doi.org/10.1017/9781009023405.
- J. Erdmenger, K. T. Grosvenor, and R. Jefferson, “Towards quantifying information flows: relative entropy in deep neural networks and the renormalization group,” SciPost Phys. 12 no. 1, (2022) 041, arXiv:2107.06898 [hep-th].
- J. Cotler and S. Rezchikov, “Renormalization group flow as optimal transport,” Physical Review D 108 no. 2, (July, 2023) .
- J. Cotler and S. Rezchikov, “Renormalizing diffusion models,” arXiv:2308.12355 [hep-th].
- D. S. Berman, J. J. Heckman, and M. Klinger, “On the dynamics of inference and learning,” arXiv:2204.12939 [cond-mat.dis-nn].
- D. S. Berman and M. S. Klinger, “The inverse of exact renormalization group flows as statistical inference,” arXiv:2212.11379 [hep-th].
- D. S. Berman, M. S. Klinger, and A. G. Stapleton, “Bayesian renormalization,” Machine Learning: Science and Technology 4 no. 4, (Oct., 2023) 045011.
- D. S. Berman, M. S. Klinger, and A. G. Stapleton, “Ncoder – a quantum field theory approach to encoding data,” 2024.
- A. B. Atanasov, J. A. Zavatone-Veth, and C. Pehlevan, “Scaling and renormalization in high-dimensional regression,” 2024.
- D. Malzahn and M. Opper, “A variational approach to learning curves,” in Advances in Neural Information Processing Systems, T. Dietterich, S. Becker, and Z. Ghahramani, eds., vol. 14. MIT Press, 2001. https://proceedings.neurips.cc/paper_files/paper/2001/file/26f5bd4aa64fdadf96152ca6e6408068-Paper.pdf.
- P. Sollich and C. K. I. Williams, “Understanding gaussian process regression using the equivalent kernel,” in Proceedings of the First International Conference on Deterministic and Statistical Methods in Machine Learning, p. 211–228. Springer-Verlag, Berlin, Heidelberg, 2004. https://doi.org/10.1007/11559887_13.
- J. A. Zavatone-Veth and C. Pehlevan, “Exact marginal prior distributions of finite bayesian neural networks,” arXiv:2104.11734 [cs.LG].
- U. M. Tomasini, A. Sclocchi, and M. Wyart, “Failure and success of the spectral bias prediction for kernel ridge regression: the case of low-dimensional data,” arXiv:2202.03348 [cs.LG].
- G. Naveh, O. Ben David, H. Sompolinsky, and Z. Ringel, “Predicting the outputs of finite deep neural networks trained with noisy gradients,” Physical Review E 104 no. 6, (Dec., 2021) .