Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bayesian RG Flow in Neural Network Field Theories (2405.17538v2)

Published 27 May 2024 in hep-th, cond-mat.dis-nn, and cs.LG

Abstract: The Neural Network Field Theory correspondence (NNFT) is a mapping from neural network (NN) architectures into the space of statistical field theories (SFTs). The Bayesian renormalization group (BRG) is an information-theoretic coarse graining scheme that generalizes the principles of the exact renormalization group (ERG) to arbitrarily parameterized probability distributions, including those of NNs. In BRG, coarse graining is performed in parameter space with respect to an information-theoretic distinguishability scale set by the Fisher information metric. In this paper, we unify NNFT and BRG to form a powerful new framework for exploring the space of NNs and SFTs, which we coin BRG-NNFT. With BRG-NNFT, NN training dynamics can be interpreted as inducing a flow in the space of SFTs from the information-theoretic IR' $\rightarrow$UV'. Conversely, applying an information-shell coarse graining to the trained network's parameters induces a flow in the space of SFTs from the information-theoretic UV' $\rightarrow$IR'. When the information-theoretic cutoff scale coincides with a standard momentum scale, BRG is equivalent to ERG. We demonstrate the BRG-NNFT correspondence on two analytically tractable examples. First, we construct BRG flows for trained, infinite-width NNs, of arbitrary depth, with generic activation functions. As a special case, we then restrict to architectures with a single infinitely-wide layer, scalar outputs, and generalized cos-net activations. In this case, we show that BRG coarse-graining corresponds exactly to the momentum-shell ERG flow of a free scalar SFT. Our analytic results are corroborated by a numerical experiment in which an ensemble of asymptotically wide NNs are trained and subsequently renormalized using an information-shell BRG scheme.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (108)
  1. J. Arbel, K. Pitas, M. Vladimirova & V. Fortuin, “A Primer on Bayesian Neural Networks: Review and Debates”, arXiv:2309.16314 ​[stat.ML]
  2. M. Magris & A. Iosifidis, “Bayesian learning for neural networks: an algorithmic survey”, Artificial Intelligence Review 56, 11773 (2023)
  3. R. M. Neal, “BAYESIAN LEARNING FOR NEURAL NETWORKS”
  4. L. V. Jospin, H. Laga, F. Boussaid, W. Buntine & M. Bennamoun, “Hands-On Bayesian Neural Networks—A Tutorial for Deep Learning Users”, IEEE Computational Intelligence Magazine 17, 29 (2022)
  5. G. Aarts, B. Lucini & C. Park, “Scalar field restricted Boltzmann machine as an ultraviolet regulator”, Phys. Rev. D 109, 034521 (2024), arXiv:2309.15002 ​[hep-lat]
  6. B. Bordelon & C. Pehlevan, “Dynamics of Finite Width Kernel and Prediction Fluctuations in Mean Field Neural Networks”, arXiv:2304.03408 ​[stat.ML]
  7. G. Aarts, D. Bachtis & B. Lucini, “Interpreting machine learning functions as physical observables”, PoS LATTICE2021, 248 (2022), arXiv:2109.08497 ​[hep-lat]
  8. D. Bachtis, G. Aarts & B. Lucini, “Machine learning with quantum field theories”, PoS LATTICE2021, 201 (2022), arXiv:2109.07730 ​[cs.LG]
  9. D. A. Roberts, S. Yaida & B. Hanin, “The Principles of Deep Learning Theory: An Effective Theory Approach to Understanding Neural Networks”, Cambridge University Press (2022)
  10. R. Bondesan & M. Welling, “The Hintons in your Neural Network: a Quantum Field Theory View of Deep Learning”, arXiv:2103.04913 ​[quant-ph]
  11. D. Bachtis, G. Aarts & B. Lucini, “Quantum field-theoretic machine learning”, Phys. Rev. D 103, 074510 (2021), arXiv:2102.09449 ​[hep-lat]
  12. J. Halverson, A. Maiti & K. Stoner, “Neural Networks and Quantum Field Theory”, Mach. Learn. Sci. Tech. 2, 035002 (2021), arXiv:2008.08601 ​[cs.LG]
  13. E. Dyer & G. Gur-Ari, “Asymptotics of Wide Networks from Feynman Diagrams”, arXiv:1909.11304 ​[cs.LG]
  14. O. Cohen, O. Malka & Z. Ringel, “Learning curves for overparametrized deep neural networks: A field theory perspective”, Phys. Rev. Res. 3, 023034 (2021), https://link.aps.org/doi/10.1103/PhysRevResearch.3.023034
  15. M. Helias & D. Dahmen, “Statistical field theory for neural networks”, Springer (2020)
  16. E. Nalisnick, P. Smyth & D. Tran, “A Brief Tour of Deep Learning from a Statistical Perspective”, Annual Review of Statistics and Its Application 10, 219 (2023), https://www.annualreviews.org/content/journals/10.1146/annurev-statistics-032921-013738
  17. Y. Bahri, J. Kadmon, J. Pennington, S. S. Schoenholz, J. Sohl-Dickstein & S. Ganguli, “Statistical Mechanics of Deep Learning”, Annual Review of Condensed Matter Physics 11, 501 (2020), https://www.annualreviews.org/content/journals/10.1146/annurev-conmatphys-031119-050745
  18. S. S. Schoenholz, J. Pennington & J. Sohl-Dickstein, “A Correspondence Between Random Neural Networks and Statistical Field Theory”, arXiv:1710.06570 ​[stat.ML]
  19. B. Poole, S. Lahiri, M. Raghu, J. Sohl-Dickstein & S. Ganguli, “Exponential expressivity in deep neural networks through transient chaos”, arXiv:1606.05340 ​[stat.ML]
  20. I. Banta, T. Cai, N. Craig & Z. Zhang, “Structures of neural network effective theories”, Phys. Rev. D 109, 105007 (2024), arXiv:2305.02334 ​[hep-th]
  21. A. Bukva, J. de Gier, K. T. Grosvenor, R. Jefferson, K. Schalm & E. Schwander, “Criticality versus uniformity in deep neural networks”, arXiv:2304.04784 ​[cs.LG]
  22. K. T. Grosvenor & R. Jefferson, “The edge of chaos: quantum field theory and deep neural networks”, SciPost Phys. 12, 081 (2022), arXiv:2109.13247 ​[hep-th]
  23. N. Bertschinger, T. Natschläger & R. Legenstein, “At the Edge of Chaos: Real-time Computations and Self-Organized Criticality in Recurrent Neural Networks”, in “Advances in Neural Information Processing Systems”, ed: L. Saul, Y. Weiss & L. Bottou, MIT Press (2004)
  24. N. Bertschinger & T. Natschläger, “Real-Time Computation at the Edge of Chaos in Recurrent Neural Networks”, Neural Computation 16, 1413 (2004), https://direct.mit.edu/neco/article-pdf/16/7/1413/815900/089976604323057443.pdf, https://doi.org/10.1162/089976604323057443
  25. S. S. Schoenholz, J. Gilmer, S. Ganguli & J. Sohl-Dickstein, “Deep Information Propagation”, arXiv:1611.01232 ​[stat.ML]
  26. H. Erbin, V. Lahoche & D. O. Samary, “Renormalization in the neural network-quantum field theory correspondence”, arXiv:2212.11811 ​[hep-th]
  27. H. Erbin, V. Lahoche & D. O. Samary, “Non-perturbative renormalization for the neural network-QFT correspondence”, Mach. Learn. Sci. Tech. 3, 015027 (2022), arXiv:2108.01403 ​[hep-th]
  28. O. J. Rosten, “Fundamentals of the Exact Renormalization Group”, Phys. Rept. 511, 177 (2012), arXiv:1003.1366 ​[hep-th]
  29. C. Bagnuls & C. Bervillier, “Exact renormalization group equations. An Introductory review”, Phys. Rept. 348, 91 (2001), hep-th/0002034
  30. D. S. Berman, M. S. Klinger & A. G. Stapleton, “Bayesian renormalization”, Machine Learning: Science and Technology 4, 045011 (2023), http://dx.doi.org/10.1088/2632-2153/ad0102
  31. D. S. Berman & M. S. Klinger, “The Inverse of Exact Renormalization Group Flows as Statistical Inference”, Entropy 26, 389 (2024), arXiv:2212.11379 ​[hep-th]
  32. M. Demirtas, J. Halverson, A. Maiti, M. D. Schwartz & K. Stoner, “Neural network field theories: non-Gaussianity, actions, and locality”, Mach. Learn. Sci. Tech. 5, 015002 (2024), arXiv:2307.03223 ​[hep-th]
  33. J. Halverson, “Building Quantum Field Theories Out of Neurons”, arXiv:2112.04527 ​[hep-th]
  34. A. Maiti, K. Stoner & J. Halverson, “Symmetry-via-Duality: Invariant Neural Network Densities from Parameter-Space Correlators”, arXiv:2106.00694 ​[cs.LG]
  35. K. G. Wilson, “The renormalization group: Critical phenomena and the Kondo problem”, Reviews of modern physics 47, 773 (1975)
  36. K. G. Wilson, “The renormalization group and critical phenomena”, Reviews of Modern Physics 55, 583 (1983)
  37. J. Polchinski, “Renormalization and Effective Lagrangians”, Nucl. Phys. B 231, 269 (1984)
  38. E. Hirst, J. N. Howard, M. S. Klinger & A. G. Stapleton, “to appear”
  39. J. Cotler & S. Rezchikov, “Renormalizing Diffusion Models”, arXiv:2308.12355 ​[hep-th]
  40. M. Gerdes, P. de Haan, C. Rainone, R. Bondesan & M. C. N. Cheng, “Learning lattice quantum field theories with equivariant continuous flows”, SciPost Phys. 15, 238 (2023), https://scipost.org/10.21468/SciPostPhys.15.6.238
  41. P. de Haan, C. Rainone, M. C. N. Cheng & R. Bondesan, “Scaling Up Machine Learning For Quantum Field Theory with Equivariant Continuous Flows”, arXiv:2110.02673 ​[cs.LG]
  42. D. Boyda et al., “Applications of Machine Learning to Lattice Quantum Field Theory”, in “Snowmass 2021”
  43. K. Cranmer, G. Kanwar, S. Racanière, D. J. Rezende & P. E. Shanahan, “Advances in machine-learning-based sampling motivated by lattice quantum chromodynamics”, Nature Rev. Phys. 5, 526 (2023), arXiv:2309.01156 ​[hep-lat]
  44. R. Abbott, M. S. Albergo, D. Boyda, D. C. Hackett, G. Kanwar, F. Romero-López, P. E. Shanahan & J. M. Urban, “Multiscale Normalizing Flows for Gauge Theories”, PoS LATTICE2023, 035 (2024), arXiv:2404.10819 ​[hep-lat]
  45. R. Abbott, A. Botev, D. Boyda, D. C. Hackett, G. Kanwar, S. Racanière, D. J. Rezende, F. Romero-López, P. E. Shanahan & J. M. Urban, “Applications of flow models to the generation of correlated lattice QCD ensembles”, arXiv:2401.10874 ​[hep-lat]
  46. R. Abbott et al., “Normalizing flows for lattice gauge theory in arbitrary space-time dimension”, arXiv:2305.02402 ​[hep-lat]
  47. R. Abbott et al., “Aspects of scaling and scalability for flow-based sampling of lattice QCD”, Eur. Phys. J. A 59, 257 (2023), arXiv:2211.07541 ​[hep-lat]
  48. D. Bachtis, G. Aarts & B. Lucini, “Quantum field theories, Markov random fields and machine learning”, J. Phys. Conf. Ser. 2207, 012056 (2022), arXiv:2110.10928 ​[cs.LG]
  49. D. Bachtis, G. Aarts, F. Di Renzo & B. Lucini, “Inverse Renormalization Group in Quantum Field Theory”, Phys. Rev. Lett. 128, 081603 (2022), arXiv:2107.00466 ​[hep-lat]
  50. J. Ho, A. Jain & P. Abbeel, “Denoising Diffusion Probabilistic Models”, arXiv:2006.11239 ​[cs.LG]
  51. J. Sohl-Dickstein, E. A. Weiss, N. Maheswaranathan & S. Ganguli, “Deep Unsupervised Learning using Nonequilibrium Thermodynamics”, arXiv:1503.03585 ​[cs.LG]
  52. D. J. Rezende & S. Mohamed, “Variational Inference with Normalizing Flows”, arXiv:1505.05770 ​[stat.ML]
  53. A. Power, Y. Burda, H. Edwards, I. Babuschkin & V. Misra, “Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets”, arXiv:2201.02177 ​[cs.LG]
  54. A. Canatar & C. Pehlevan, “A Kernel Analysis of Feature Learning in Deep Neural Networks”, in “2022 58th Annual Allerton Conference on Communication, Control, and Computing (Allerton)”
  55. B. Bordelon & C. Pehlevan, “Self-Consistent Dynamical Field Theory of Kernel Evolution in Wide Neural Networks”, arXiv:2205.09653 ​[stat.ML]
  56. J. A. Zavatone-Veth, W. L. Tong & C. Pehlevan, “Contrasting random and learned features in deep Bayesian linear regression”, Phys. Rev. E 105, 064118 (2022), https://link.aps.org/doi/10.1103/PhysRevE.105.064118
  57. I. Seroussi, G. Naveh & Z. Ringel, “Separation of Scales and a Thermodynamic Description of Feature Learning in Some CNNs”, arXiv:2112.15383 ​[stat.ML]
  58. G. Naveh & Z. Ringel, “A self consistent theory of Gaussian Processes captures feature learning effects in finite CNNs”, arXiv:2106.04110 ​[cs.LG]
  59. J. A. Zavatone-Veth, A. Canatar, B. S. Ruben & C. Pehlevan, “Asymptotics of representation learning in finite Bayesian neural networks*”, Journal of Statistical Mechanics: Theory and Experiment 2022, 114008 (2022), https://dx.doi.org/10.1088/1742-5468/ac98a6
  60. G. Yang & E. J. Hu, “Feature Learning in Infinite-Width Neural Networks”, arXiv:2011.14522 ​[cs.LG]
  61. A. B. Atanasov, J. A. Zavatone-Veth & C. Pehlevan, “Scaling and renormalization in high-dimensional regression”, arXiv:2405.00592 ​[stat.ML]
  62. A. Maloney, D. A. Roberts & J. Sully, “A Solvable Model of Neural Scaling Laws”, arXiv:2210.16859 ​[cs.LG]
  63. Y. Bahri, E. Dyer, J. Kaplan, J. Lee & U. Sharma, “Explaining Neural Scaling Laws”, arXiv:2102.06701 ​[cs.LG]
  64. J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu & D. Amodei, “Scaling Laws for Neural Language Models”, arXiv:2001.08361 ​[cs.LG]
  65. A. F. Agarap, “Deep Learning using Rectified Linear Units (ReLU)”, arXiv:1803.08375 ​[cs.NE]
  66. Q. Li & H. Sompolinsky, “Statistical Mechanics of Deep Linear Neural Networks: The Backpropagating Kernel Renormalization”, Phys. Rev. X 11, 031059 (2021), https://link.aps.org/doi/10.1103/PhysRevX.11.031059
  67. J. Hron, Y. Bahri, R. Novak, J. Pennington & J. Sohl-Dickstein, “Exact posterior distributions of wide Bayesian neural networks”, arXiv:2006.10541 ​[stat.ML]
  68. J. A. Zavatone-Veth & C. Pehlevan, “Exact marginal prior distributions of finite Bayesian neural networks”, arXiv:2104.11734 ​[cs.LG]
  69. R. Novak, L. Xiao, J. Lee, Y. Bahri, G. Yang, J. Hron, D. A. Abolafia, J. Pennington & J. Sohl-Dickstein, “Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes”, arXiv:1810.05148 ​[stat.ML]
  70. J. A. Zavatone-Veth & C. Pehlevan, “Depth induces scale-averaging in overparameterized linear Bayesian neural networks”, in “2021 55th Asilomar Conference on Signals, Systems, and Computers”
  71. B. Hanin & A. Zlokapa, “Bayesian Interpolation with Deep Linear Networks”, arXiv:2212.14457 ​[stat.ML]
  72. L. Noci, G. Bachmann, K. Roth, S. Nowozin & T. Hofmann, “Precise characterization of the prior predictive distribution of deep ReLU networks”, arXiv:2106.06615 ​[cs.LG]
  73. A. G. de G. Matthews, M. Rowland, J. Hron, R. E. Turner & Z. Ghahramani, “Gaussian Process Behaviour in Wide Deep Neural Networks”, arXiv:1804.11271 ​[stat.ML]
  74. A. K. David M. Blei & J. D. McAuliffe, “Variational Inference: A Review for Statisticians”, Journal of the American Statistical Association 112, 859 (2017), https://doi.org/10.1080/01621459.2017.1285773, https://doi.org/10.1080/01621459.2017.1285773
  75. P. Izmailov, S. Vikram, M. D. Hoffman & A. G. Wilson, “What Are Bayesian Neural Network Posteriors Really Like?”, arXiv:2104.14421 ​[cs.LG]
  76. O. Balabanov, B. Mehlig & H. Linander, “Bayesian Posterior Approximation With Stochastic Ensembles”, in “2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)”
  77. P. S. Myshkov & S. J. Julier, “Posterior distribution analysis for Bayesian inference in neural networks”, https://api.semanticscholar.org/CorpusID:11541755
  78. Howard, Jessica N. and Jefferson, Ro and Maiti, Anindita and Ringel, Zohar, “Wilsonian Renormalization of Neural Network Gaussian Processes”, arXiv:2405.06008 ​[cs.LG]
  79. D. E. Gökmen, Z. Ringel, S. D. Huber & M. Koch-Janusz, “Symmetries and phase diagrams with real-space mutual information neural estimation”, Phys. Rev. E 104, 064106 (2021), https://link.aps.org/doi/10.1103/PhysRevE.104.064106
  80. D. E. Gökmen, Z. Ringel, S. D. Huber & M. Koch-Janusz, “Statistical Physics through the Lens of Real-Space Mutual Information”, Phys. Rev. Lett. 127, 240603 (2021), https://link.aps.org/doi/10.1103/PhysRevLett.127.240603
  81. P. M. Lenggenhager, D. E. Gökmen, Z. Ringel, S. D. Huber & M. Koch-Janusz, “Optimal Renormalization Group Transformation from Information Theory”, Phys. Rev. X 10, 011037 (2020), https://link.aps.org/doi/10.1103/PhysRevX.10.011037
  82. M. Koch-Janusz & Z. Ringel, “Mutual information, neural networks and the renormalization group”, Nature Physics 14, 578 (2018)
  83. A. Gordon, A. Banerjee, M. Koch-Janusz & Z. Ringel, “Relevance in the Renormalization Group and in Information Theory”, Phys. Rev. Lett. 126, 240601 (2021), https://link.aps.org/doi/10.1103/PhysRevLett.126.240601
  84. S. Yaida, “Non-Gaussian processes and neural networks at finite widths”, arXiv:1910.00019 ​[stat.ML]
  85. J. Erdmenger, K. T. Grosvenor & R. Jefferson, “Towards quantifying information flows: relative entropy in deep neural networks and the renormalization group”, SciPost Phys. 12, 041 (2022), arXiv:2107.06898 ​[hep-th]
  86. J. Cotler & S. Rezchikov, “Renormalization group flow as optimal transport”, Phys. Rev. D 108, 025003 (2023), arXiv:2202.11737 ​[hep-th]
  87. K. Osterwalder & R. Schrader, “AXIOMS FOR EUCLIDEAN GREEN’S FUNCTIONS”, Commun. Math. Phys. 31, 83 (1973)
  88. K. Osterwalder & R. Schrader, “Axioms for Euclidean Green’s Functions. 2.”, Commun. Math. Phys. 42, 281 (1975)
  89. P. Hall, “Inverting an Edgeworth Expansion”, The Annals of Statistics 11, 569 (1983), https://doi.org/10.1214/aos/1176346162
  90. P. Hall, “The bootstrap and Edgeworth expansion”, Springer Science & Business Media (2013)
  91. D. S. Berman, J. J. Heckman & M. Klinger, “On the Dynamics of Inference and Learning”, arXiv:2204.12939 ​[cond-mat.dis-nn]
  92. S.-i. Amari, “Information geometry and its applications”, Springer (2016)
  93. F. Nielsen, “An elementary introduction to information geometry”, Entropy 22, 1100 (2020)
  94. K. N. Quinn, M. C. Abbott, M. K. Transtrum, B. B. Machta & J. P. Sethna, “Information geometry for multiparameter models: new perspectives on the origin of simplicity”, Rept. Prog. Phys. 86, 035901 (2023), arXiv:2111.07176 ​[cond-mat.stat-mech]
  95. Y. Avidan, Q. Li & H. Sompolinsky, “Connecting NTK and NNGP: A Unified Theoretical Framework for Neural Network Learning Dynamics in the Kernel Regime”, arXiv:2309.04522 ​[cs.LG]
  96. B. Hanin, “Random Neural Networks in the Infinite Width Limit as Gaussian Processes”, arXiv:2107.01562 ​[math.PR]
  97. G. Yang, “Tensor Programs I: Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes”, arXiv:1910.12478 ​[cs.NE]
  98. J. M. Antognini, “Finite size corrections for neural network Gaussian processes”, arXiv:1908.10030 ​[cs.LG]
  99. J. Lee, L. Xiao, S. S. Schoenholz, Y. Bahri, R. Novak, J. Sohl-Dickstein & J. Pennington, “Wide neural networks of any depth evolve as linear models under gradient descent*”, Journal of Statistical Mechanics: Theory and Experiment 2020, 124002 (2020), https://dx.doi.org/10.1088/1742-5468/abc62b
  100. J. Lee, Y. Bahri, R. Novak, S. S. Schoenholz, J. Pennington & J. Sohl-Dickstein, “Deep Neural Networks as Gaussian Processes”, arXiv:1711.00165 ​[stat.ML]
  101. C. Williams, “Computing with Infinite Networks”, in “Advances in Neural Information Processing Systems”, ed: M. Mozer, M. Jordan & T. Petsche, MIT Press (1996)
  102. C. E. Rasmussen & C. K. I. Williams, “Gaussian Processes for Machine Learning”, The MIT Press (2005)
  103. J. Lee, L. Xiao, S. S. Schoenholz, Y. Bahri, R. Novak, J. Sohl-Dickstein & J. Pennington, “Wide neural networks of any depth evolve as linear models under gradient descent”, Journal of Statistical Mechanics: Theory and Experiment 2020, 124002 (2020), http://dx.doi.org/10.1088/1742-5468/abc62b
  104. D. S. Berman, M. S. Klinger & A. G. Stapleton, “NCoder – A Quantum Field Theory approach to encoding data”, arXiv:2402.00944 ​[hep-th]
  105. N. Nanda, L. Chan, T. Lieberum, J. Smith & J. Steinhardt, “Progress measures for grokking via mechanistic interpretability”, in “The Eleventh International Conference on Learning Representations”
  106. A. Gromov, “Grokking modular arithmetic”, arXiv:2301.02679 ​[cs.LG]
  107. V. Niarchos & C. Papageorgakis, “Learning S-Matrix Phases with Neural Operators”, arXiv:2404.14551 ​[hep-th]
  108. T. King, S. Butcher & L. Zalewski, “Apocrita - High Performance Computing Cluster for Queen Mary University of London”, https://doi.org/10.5281/zenodo.438045
Citations (1)

Summary

We haven't generated a summary for this paper yet.