TASI Lectures on Physics for Machine Learning (2408.00082v1)
Abstract: These notes are based on lectures I gave at TASI 2024 on Physics for Machine Learning. The focus is on neural network theory, organized according to network expressivity, statistics, and dynamics. I present classic results such as the universal approximation theorem and neural network / Gaussian process correspondence, and also more recent results such as the neural tangent kernel, feature learning with the maximal update parameterization, and Kolmogorov-Arnold networks. The exposition on neural network theory emphasizes a field theoretic perspective familiar to theoretical physicists. I elaborate on connections between the two, including a neural network approach to field theory.
- Simon & Schuster, New York, first simon & schuster hardcover edition ed., 2014. On Shelf.
- D. Silver, J. Schrittwieser, et al., “Mastering the game of go without human knowledge,” Nature 550 no. 7676, (Oct, 2017) 354–359. https://doi.org/10.1038/nature24270.
- D. Silver, T. Hubert, et al., “Mastering chess and shogi by self-play with a general reinforcement learning algorithm,” arXiv preprint arXiv:1712.01815 (2017) .
- B. LeMoine, “How the artificial intelligence program alphazero mastered its games,” The New Yorker (Jan, 2023) . https://www.newyorker.com/science/elements/how-the-artificial-intelligence-program-alphazero-mastered-its-games.
- J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, “Deep unsupervised learning using nonequilibrium thermodynamics,” in International conference on machine learning, pp. 2256–2265, PMLR. 2015.
- Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differential equations,” arXiv preprint arXiv:2011.13456 (2020) .
- A. Ananthaswamy, “The physics principle that inspired modern ai art,” Quanta Magazine (Jan, 2023) . https://www.quantamagazine.org/the-physics-principle-that-inspired-modern-ai-art-20230105/.
- T. Brown, B. Mann, et al., “Language models are few-shot learners,” in Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, eds., vol. 33, pp. 1877–1901. Curran Associates, Inc., 2020. https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
- K. Roose, “The brilliance and weirdness of chatgpt,” The New York Times (Dec, 2022) . https://www.nytimes.com/2022/12/05/technology/chatgpt-ai-twitter.html.
- A. Bhatia, “Watch an a.i. learn to write by reading nothing but shakespeare,” The New York Times (Apr, 2023) . https://www.nytimes.com/interactive/2023/04/26/upshot/gpt-from-scratch.html.
- S. Bubeck, V. Chandrasekaran, et al., “Sparks of artificial general intelligence: Early experiments with gpt-4,” arXiv preprint arXiv:2303.12712 (2023) .
- J. Jumper, R. Evans, et al., “Highly accurate protein structure prediction with alphafold,” Nature 596 no. 7873, (Aug, 2021) 583–589. https://doi.org/10.1038/s41586-021-03819-2.
- G. Carleo and M. Troyer, “Solving the quantum many-body problem with artificial neural networks,” Science 355 no. 6325, (2017) 602–606.
- J. Carrasquilla and R. G. Melko, “Machine learning phases of matter,” Nature Physics 13 no. 5, (2017) 431–434.
- L. B. Anderson, M. Gerdes, J. Gray, S. Krippendorf, N. Raghuram, and F. Ruehle, “Moduli-dependent Calabi-Yau and SU(3)-structure metrics from Machine Learning,” JHEP 05 (2021) 013, arXiv:2012.04656 [hep-th].
- M. R. Douglas, S. Lakshminarasimhan, and Y. Qi, “Numerical Calabi-Yau metrics from holomorphic networks,” arXiv:2012.04797 [hep-th].
- V. Jejjala, D. K. Mayorga Pena, and C. Mishra, “Neural network approximations for Calabi-Yau metrics,” JHEP 08 (2022) 105, arXiv:2012.15821 [hep-th].
- S. Gukov, J. Halverson, and F. Ruehle, “Rigor with machine learning from field theory to the poincaré conjecture,” Nature Reviews Physics (2024) 1–10.
- “Iaifi summer workshop 2023,” IAIFI - Institute for Artificial Intelligence and Fundamental Interactions. 2023. https://iaifi.org/events/summer_workshop_2023.html.
- “Machine learning and the physical sciences,” in Proceedings of the Neural Information Processing Systems (NeurIPS) Workshop on Machine Learning and the Physical Sciences. Neural Information Processing Systems Foundation, Inc., 2023. https://ml4physicalsciences.github.io/2023/.
- G. Carleo, I. Cirac, K. Cranmer, L. Daudet, M. Schuld, N. Tishby, L. Vogt-Maranto, and L. Zdeborová, “Machine learning and the physical sciences,” Rev. Mod. Phys. 91 no. 4, (2019) 045002, arXiv:1903.10563 [physics.comp-ph].
- “Tasi 2024: Frontiers in particle theory.” 2024. https://www.colorado.edu/physics/events/summer-intensive-programs/theoretical-advanced-study-institute-elementary-particle-physics-current. Summer Intensive Programs, University of Colorado Boulder.
- H. Robbins and S. Monro, “A Stochastic Approximation Method,” The Annals of Mathematical Statistics 22 no. 3, (1951) 400 – 407. https://doi.org/10.1214/aoms/1177729586.
- F. Rosenblatt, “The perceptron: a probabilistic model for information storage and organization in the brain.” Psychological review 65 6 (1958) 386–408. https://api.semanticscholar.org/CorpusID:12781225.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization.” 2017. https://arxiv.org/abs/1412.6980.
- G. B. De Luca and E. Silverstein, “Born-infeld (bi) for ai: Energy-conserving descent (ecd) for optimization,” in International Conference on Machine Learning, pp. 4918–4936, PMLR. 2022.
- G. B. De Luca, A. Gatti, and E. Silverstein, “Improving energy conserving descent for machine learning: Theory and practice,” arXiv preprint arXiv:2306.00352 (2023) .
- G. Cybenko, “Approximation by superpositions of a sigmoidal function,” Mathematics of Control, Signals and Systems 2 no. 4, (1989) 303–314.
- K. Hornik, M. Stinchcombe, and H. White, “Approximation capabilities of multilayer feedforward networks,” Neural Networks 4 no. 2, (1991) 251–257.
- A. N. Kolmogorov, “On the representation of continuous functions of several variables by superpositions of continuous functions of one variable and addition,” Doklady Akademii Nauk SSSR 114 (1957) 953–956.
- V. I. Arnold, “On functions of three variables,” Doklady Akademii Nauk SSSR 114 (1957) 679–681.
- Z. Liu, Y. Wang, S. Vaidya, F. Ruehle, J. Halverson, M. Soljačić, T. Y. Hou, and M. Tegmark, “KAN: Kolmogorov-Arnold Networks,” arXiv:2404.19756 [cs.LG].
- R. M. Neal, “Bayesian learning for neural networks,” Lecture Notes in Statistics 118 (1996) .
- C. K. Williams, “Computing with infinite networks,” Advances in neural information processing systems (1997) .
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems, F. Pereira, C. Burges, L. Bottou, and K. Weinberger, eds., vol. 25. Curran Associates, Inc., 2012. https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf.
- G. Yang, “Wide feedforward or recurrent neural networks of any architecture are gaussian processes,” Advances in Neural Information Processing Systems 32 (2019) .
- M. Demirtas, J. Halverson, A. Maiti, M. D. Schwartz, and K. Stoner, “Neural network field theories: non-Gaussianity, actions, and locality,” Mach. Learn. Sci. Tech. 5 no. 1, (2024) 015002, arXiv:2307.03223 [hep-th].
- S. Yaida, “Non-gaussian processes and neural networks at finite widths,” in Mathematical and Scientific Machine Learning, pp. 165–192, PMLR. 2020.
- T. Cohen and M. Welling, “Group equivariant convolutional networks,” in Proceedings of The 33rd International Conference on Machine Learning, M. F. Balcan and K. Q. Weinberger, eds., vol. 48 of Proceedings of Machine Learning Research, pp. 2990–2999. PMLR, New York, New York, USA, 20–22 jun, 2016. https://proceedings.mlr.press/v48/cohenc16.html.
- M. Winkels and T. S. Cohen, “3d g-cnns for pulmonary nodule detection,” arXiv preprint arXiv:1804.04656 (2018) .
- J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei, “Scaling laws for neural language models,” arXiv preprint arXiv:2001.08361 (2020) .
- S. Batzner, A. Musaelian, L. Sun, M. Geiger, J. P. Mailoa, M. Kornbluth, N. Molinari, T. E. Smidt, and B. Kozinsky, “E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials,” Nature Communications 13 no. 1, (May, 2022) 2453. https://doi.org/10.1038/s41467-022-29939-5.
- N. Frey, R. Soklaski, S. Axelrod, S. Samsi, R. Gomez-Bombarelli, C. Coley, et al., “Neural scaling of deep chemical models,” ChemRxiv (2022) . This content is a preprint and has not been peer-reviewed.
- D. Boyda, G. Kanwar, S. Racanière, D. J. Rezende, M. S. Albergo, K. Cranmer, D. C. Hackett, and P. E. Shanahan, “Sampling using su (n) gauge equivariant flows,” Physical Review D 103 no. 7, (2021) 074504.
- A. Maiti, K. Stoner, and J. Halverson, “Symmetry-via-Duality: Invariant Neural Network Densities from Parameter-Space Correlators,” arXiv:2106.00694 [cs.LG].
- J. Halverson, A. Maiti, and K. Stoner, “Neural Networks and Quantum Field Theory,” Mach. Learn. Sci. Tech. 2 no. 3, (2021) 035002, arXiv:2008.08601 [cs.LG].
- J. Halverson, “Building Quantum Field Theories Out of Neurons,” arXiv:2112.04527 [hep-th].
- A. Jacot, F. Gabriel, and C. Hongler, “Neural tangent kernel: Convergence and generalization in neural networks,” in Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, eds., vol. 31. Curran Associates, Inc., 2018. https://proceedings.neurips.cc/paper_files/paper/2018/file/5a4be1fa34e62bb8a6ec6b91d2462f5a-Paper.pdf.
- J. Lee, L. Xiao, S. Schoenholz, Y. Bahri, R. Novak, J. Sohl-Dickstein, and J. Pennington, “Wide neural networks of any depth evolve as linear models under gradient descent,” Advances in neural information processing systems 32 (2019) .
- C. Pehlevan and B. Bordelon, “Lecture notes on infinite-width limits of neural networks.” August, 2024. https://mlschool.princeton.edu/events/2023/pehlevan. Princeton Machine Learning Theory Summer School, August 6 - 15, 2024.
- G. Yang and E. J. Hu, “Feature learning in infinite-width neural networks,” arXiv preprint arXiv:2011.14522 (2020) .
- B. Bordelon and C. Pehlevan, “Self-consistent dynamical field theory of kernel evolution in wide neural networks,” Advances in Neural Information Processing Systems 35 (2022) 32240–32256.
- Cambridge University Press Cambridge, MA, USA, 2022.
- S. Yaida, “Meta-principled family of hyperparameter scaling strategies,” arXiv preprint arXiv:2210.04909 (2022) .
- S. S. Schoenholz, J. Gilmer, S. Ganguli, and J. Sohl-Dickstein, “Deep information propagation,” arXiv preprint arXiv:1611.01232 (2016) .
- M. Zhdanov, D. Ruhe, M. Weiler, A. Lucic, J. Brandstetter, and P. Forré, “Clifford-steerable convolutional neural networks,” arXiv preprint arXiv:2402.14730 (2024) .
- K. Osterwalder and R. Schrader, “AXIOMS FOR EUCLIDEAN GREEN’S FUNCTIONS,” Commun. Math. Phys. 31 (1973) 83–112.
- D. Simmons-Duffin, “The Conformal Bootstrap,” in Theoretical Advanced Study Institute in Elementary Particle Physics: New Frontiers in Fields and Strings, pp. 1–74. 2017. arXiv:1602.07982 [hep-th].
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems 30 (2017) .
- M. M. Bronstein, J. Bruna, T. Cohen, and P. Veličković, “Geometric deep learning: Grids, groups, graphs, geodesics, and gauges,” arXiv preprint arXiv:2104.13478 (2021) .
- MIT Press, Cambridge, MA, USA, 2016. http://www.deeplearningbook.org.
- P. Ginsparg, “Applied conformal field theory,” arXiv preprint hep-th/9108028 (1988) .
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.