Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 89 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 29 tok/s Pro
GPT-5 High 31 tok/s Pro
GPT-4o 98 tok/s Pro
GPT OSS 120B 424 tok/s Pro
Kimi K2 164 tok/s Pro
2000 character limit reached

TASI Lectures on Physics for Machine Learning (2408.00082v1)

Published 31 Jul 2024 in hep-th, cs.LG, and hep-ph

Abstract: These notes are based on lectures I gave at TASI 2024 on Physics for Machine Learning. The focus is on neural network theory, organized according to network expressivity, statistics, and dynamics. I present classic results such as the universal approximation theorem and neural network / Gaussian process correspondence, and also more recent results such as the neural tangent kernel, feature learning with the maximal update parameterization, and Kolmogorov-Arnold networks. The exposition on neural network theory emphasizes a field theoretic perspective familiar to theoretical physicists. I elaborate on connections between the two, including a neural network approach to field theory.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (62)
  1. Simon & Schuster, New York, first simon & schuster hardcover edition ed., 2014. On Shelf.
  2. D. Silver, J. Schrittwieser, et al., “Mastering the game of go without human knowledge,” Nature 550 no. 7676, (Oct, 2017) 354–359. https://doi.org/10.1038/nature24270.
  3. D. Silver, T. Hubert, et al., “Mastering chess and shogi by self-play with a general reinforcement learning algorithm,” arXiv preprint arXiv:1712.01815 (2017) .
  4. B. LeMoine, “How the artificial intelligence program alphazero mastered its games,” The New Yorker (Jan, 2023) . https://www.newyorker.com/science/elements/how-the-artificial-intelligence-program-alphazero-mastered-its-games.
  5. J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, “Deep unsupervised learning using nonequilibrium thermodynamics,” in International conference on machine learning, pp. 2256–2265, PMLR. 2015.
  6. Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differential equations,” arXiv preprint arXiv:2011.13456 (2020) .
  7. A. Ananthaswamy, “The physics principle that inspired modern ai art,” Quanta Magazine (Jan, 2023) . https://www.quantamagazine.org/the-physics-principle-that-inspired-modern-ai-art-20230105/.
  8. T. Brown, B. Mann, et al., “Language models are few-shot learners,” in Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, eds., vol. 33, pp. 1877–1901. Curran Associates, Inc., 2020. https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
  9. K. Roose, “The brilliance and weirdness of chatgpt,” The New York Times (Dec, 2022) . https://www.nytimes.com/2022/12/05/technology/chatgpt-ai-twitter.html.
  10. A. Bhatia, “Watch an a.i. learn to write by reading nothing but shakespeare,” The New York Times (Apr, 2023) . https://www.nytimes.com/interactive/2023/04/26/upshot/gpt-from-scratch.html.
  11. S. Bubeck, V. Chandrasekaran, et al., “Sparks of artificial general intelligence: Early experiments with gpt-4,” arXiv preprint arXiv:2303.12712 (2023) .
  12. J. Jumper, R. Evans, et al., “Highly accurate protein structure prediction with alphafold,” Nature 596 no. 7873, (Aug, 2021) 583–589. https://doi.org/10.1038/s41586-021-03819-2.
  13. G. Carleo and M. Troyer, “Solving the quantum many-body problem with artificial neural networks,” Science 355 no. 6325, (2017) 602–606.
  14. J. Carrasquilla and R. G. Melko, “Machine learning phases of matter,” Nature Physics 13 no. 5, (2017) 431–434.
  15. L. B. Anderson, M. Gerdes, J. Gray, S. Krippendorf, N. Raghuram, and F. Ruehle, “Moduli-dependent Calabi-Yau and SU(3)-structure metrics from Machine Learning,” JHEP 05 (2021) 013, arXiv:2012.04656 [hep-th].
  16. M. R. Douglas, S. Lakshminarasimhan, and Y. Qi, “Numerical Calabi-Yau metrics from holomorphic networks,” arXiv:2012.04797 [hep-th].
  17. V. Jejjala, D. K. Mayorga Pena, and C. Mishra, “Neural network approximations for Calabi-Yau metrics,” JHEP 08 (2022) 105, arXiv:2012.15821 [hep-th].
  18. S. Gukov, J. Halverson, and F. Ruehle, “Rigor with machine learning from field theory to the poincaré conjecture,” Nature Reviews Physics (2024) 1–10.
  19. “Iaifi summer workshop 2023,” IAIFI - Institute for Artificial Intelligence and Fundamental Interactions. 2023. https://iaifi.org/events/summer_workshop_2023.html.
  20. “Machine learning and the physical sciences,” in Proceedings of the Neural Information Processing Systems (NeurIPS) Workshop on Machine Learning and the Physical Sciences. Neural Information Processing Systems Foundation, Inc., 2023. https://ml4physicalsciences.github.io/2023/.
  21. G. Carleo, I. Cirac, K. Cranmer, L. Daudet, M. Schuld, N. Tishby, L. Vogt-Maranto, and L. Zdeborová, “Machine learning and the physical sciences,” Rev. Mod. Phys. 91 no. 4, (2019) 045002, arXiv:1903.10563 [physics.comp-ph].
  22. “Tasi 2024: Frontiers in particle theory.” 2024. https://www.colorado.edu/physics/events/summer-intensive-programs/theoretical-advanced-study-institute-elementary-particle-physics-current. Summer Intensive Programs, University of Colorado Boulder.
  23. H. Robbins and S. Monro, “A Stochastic Approximation Method,” The Annals of Mathematical Statistics 22 no. 3, (1951) 400 – 407. https://doi.org/10.1214/aoms/1177729586.
  24. F. Rosenblatt, “The perceptron: a probabilistic model for information storage and organization in the brain.” Psychological review 65 6 (1958) 386–408. https://api.semanticscholar.org/CorpusID:12781225.
  25. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization.” 2017. https://arxiv.org/abs/1412.6980.
  26. G. B. De Luca and E. Silverstein, “Born-infeld (bi) for ai: Energy-conserving descent (ecd) for optimization,” in International Conference on Machine Learning, pp. 4918–4936, PMLR. 2022.
  27. G. B. De Luca, A. Gatti, and E. Silverstein, “Improving energy conserving descent for machine learning: Theory and practice,” arXiv preprint arXiv:2306.00352 (2023) .
  28. G. Cybenko, “Approximation by superpositions of a sigmoidal function,” Mathematics of Control, Signals and Systems 2 no. 4, (1989) 303–314.
  29. K. Hornik, M. Stinchcombe, and H. White, “Approximation capabilities of multilayer feedforward networks,” Neural Networks 4 no. 2, (1991) 251–257.
  30. A. N. Kolmogorov, “On the representation of continuous functions of several variables by superpositions of continuous functions of one variable and addition,” Doklady Akademii Nauk SSSR 114 (1957) 953–956.
  31. V. I. Arnold, “On functions of three variables,” Doklady Akademii Nauk SSSR 114 (1957) 679–681.
  32. Z. Liu, Y. Wang, S. Vaidya, F. Ruehle, J. Halverson, M. Soljačić, T. Y. Hou, and M. Tegmark, “KAN: Kolmogorov-Arnold Networks,” arXiv:2404.19756 [cs.LG].
  33. R. M. Neal, “Bayesian learning for neural networks,” Lecture Notes in Statistics 118 (1996) .
  34. C. K. Williams, “Computing with infinite networks,” Advances in neural information processing systems (1997) .
  35. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems, F. Pereira, C. Burges, L. Bottou, and K. Weinberger, eds., vol. 25. Curran Associates, Inc., 2012. https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf.
  36. G. Yang, “Wide feedforward or recurrent neural networks of any architecture are gaussian processes,” Advances in Neural Information Processing Systems 32 (2019) .
  37. M. Demirtas, J. Halverson, A. Maiti, M. D. Schwartz, and K. Stoner, “Neural network field theories: non-Gaussianity, actions, and locality,” Mach. Learn. Sci. Tech. 5 no. 1, (2024) 015002, arXiv:2307.03223 [hep-th].
  38. S. Yaida, “Non-gaussian processes and neural networks at finite widths,” in Mathematical and Scientific Machine Learning, pp. 165–192, PMLR. 2020.
  39. T. Cohen and M. Welling, “Group equivariant convolutional networks,” in Proceedings of The 33rd International Conference on Machine Learning, M. F. Balcan and K. Q. Weinberger, eds., vol. 48 of Proceedings of Machine Learning Research, pp. 2990–2999. PMLR, New York, New York, USA, 20–22 jun, 2016. https://proceedings.mlr.press/v48/cohenc16.html.
  40. M. Winkels and T. S. Cohen, “3d g-cnns for pulmonary nodule detection,” arXiv preprint arXiv:1804.04656 (2018) .
  41. J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei, “Scaling laws for neural language models,” arXiv preprint arXiv:2001.08361 (2020) .
  42. S. Batzner, A. Musaelian, L. Sun, M. Geiger, J. P. Mailoa, M. Kornbluth, N. Molinari, T. E. Smidt, and B. Kozinsky, “E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials,” Nature Communications 13 no. 1, (May, 2022) 2453. https://doi.org/10.1038/s41467-022-29939-5.
  43. N. Frey, R. Soklaski, S. Axelrod, S. Samsi, R. Gomez-Bombarelli, C. Coley, et al., “Neural scaling of deep chemical models,” ChemRxiv (2022) . This content is a preprint and has not been peer-reviewed.
  44. D. Boyda, G. Kanwar, S. Racanière, D. J. Rezende, M. S. Albergo, K. Cranmer, D. C. Hackett, and P. E. Shanahan, “Sampling using su (n) gauge equivariant flows,” Physical Review D 103 no. 7, (2021) 074504.
  45. A. Maiti, K. Stoner, and J. Halverson, “Symmetry-via-Duality: Invariant Neural Network Densities from Parameter-Space Correlators,” arXiv:2106.00694 [cs.LG].
  46. J. Halverson, A. Maiti, and K. Stoner, “Neural Networks and Quantum Field Theory,” Mach. Learn. Sci. Tech. 2 no. 3, (2021) 035002, arXiv:2008.08601 [cs.LG].
  47. J. Halverson, “Building Quantum Field Theories Out of Neurons,” arXiv:2112.04527 [hep-th].
  48. A. Jacot, F. Gabriel, and C. Hongler, “Neural tangent kernel: Convergence and generalization in neural networks,” in Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, eds., vol. 31. Curran Associates, Inc., 2018. https://proceedings.neurips.cc/paper_files/paper/2018/file/5a4be1fa34e62bb8a6ec6b91d2462f5a-Paper.pdf.
  49. J. Lee, L. Xiao, S. Schoenholz, Y. Bahri, R. Novak, J. Sohl-Dickstein, and J. Pennington, “Wide neural networks of any depth evolve as linear models under gradient descent,” Advances in neural information processing systems 32 (2019) .
  50. C. Pehlevan and B. Bordelon, “Lecture notes on infinite-width limits of neural networks.” August, 2024. https://mlschool.princeton.edu/events/2023/pehlevan. Princeton Machine Learning Theory Summer School, August 6 - 15, 2024.
  51. G. Yang and E. J. Hu, “Feature learning in infinite-width neural networks,” arXiv preprint arXiv:2011.14522 (2020) .
  52. B. Bordelon and C. Pehlevan, “Self-consistent dynamical field theory of kernel evolution in wide neural networks,” Advances in Neural Information Processing Systems 35 (2022) 32240–32256.
  53. Cambridge University Press Cambridge, MA, USA, 2022.
  54. S. Yaida, “Meta-principled family of hyperparameter scaling strategies,” arXiv preprint arXiv:2210.04909 (2022) .
  55. S. S. Schoenholz, J. Gilmer, S. Ganguli, and J. Sohl-Dickstein, “Deep information propagation,” arXiv preprint arXiv:1611.01232 (2016) .
  56. M. Zhdanov, D. Ruhe, M. Weiler, A. Lucic, J. Brandstetter, and P. Forré, “Clifford-steerable convolutional neural networks,” arXiv preprint arXiv:2402.14730 (2024) .
  57. K. Osterwalder and R. Schrader, “AXIOMS FOR EUCLIDEAN GREEN’S FUNCTIONS,” Commun. Math. Phys. 31 (1973) 83–112.
  58. D. Simmons-Duffin, “The Conformal Bootstrap,” in Theoretical Advanced Study Institute in Elementary Particle Physics: New Frontiers in Fields and Strings, pp. 1–74. 2017. arXiv:1602.07982 [hep-th].
  59. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems 30 (2017) .
  60. M. M. Bronstein, J. Bruna, T. Cohen, and P. Veličković, “Geometric deep learning: Grids, groups, graphs, geodesics, and gauges,” arXiv preprint arXiv:2104.13478 (2021) .
  61. MIT Press, Cambridge, MA, USA, 2016. http://www.deeplearningbook.org.
  62. P. Ginsparg, “Applied conformal field theory,” arXiv preprint hep-th/9108028 (1988) .
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper establishes a novel NN-FT correspondence that frames neural networks as defined field theories to bridge AI and physics.
  • It details methodologies like the Universal Approximation Theorem, Kolmogorov-Arnold networks, NNGP, and NTK to analyze network expressivity and dynamics.
  • The lectures propose innovative strategies, including maximal update parameterization, to enhance feature learning and model optimization.

Overview of "TASI Lectures on Physics for Machine Learning"

The lecture notes titled "TASI Lectures on Physics for Machine Learning" by Jim Halverson, address the intersection of neural networks and theoretical physics. These notes are structured around three core themes: expressivity, statistics, and dynamics of neural networks, all examined through a field-theoretic lens. The lectures aim not only to explore neural networks from a theoretical physics perspective but also to elucidate how these concepts can inform and transform the understanding of field theory.

Expressivity of Neural Networks

The discussion on expressivity revolves around the capabilities of neural networks to approximate any function. The Universal Approximation Theorem (UAT) is presented as a cornerstone result, establishing that networks with a single hidden layer can approximate any continuous function on a compact domain, albeit without guaranteeing the ability to easily find such approximators in practice.

The notes further introduce the Kolmogorov-Arnold representation theorem as an alternative perspective on function representation, inspiring the development of the Kolmogorov-Arnold networks (KAN). These offer a new architectural approach that places activation functions on network edges rather than nodes, showcasing potential for symbolic representation and enhanced interpretability.

Statistics of Neural Networks

In examining neural network ensembles, the lectures explain how, in the infinite width limit, networks adhere to the Neural Network-Gaussian Process (NNGP) correspondence, where they resemble Gaussian processes due to the Central Limit Theorem (CLT). This insight reveals that non-Gaussian processes in networks derive from finite-width corrections or from statistical dependence among parameters, thus introducing interactions analogous to those in field theory.

The notes also explore the introduction of symmetries within neural networks, showing they can lead to invariant statistical ensembles, akin to global symmetries in field theory. Concrete examples, such as Euclidean-invariant networks, demonstrate practical implementations of these concepts.

Dynamics of Neural Networks

The dynamics section gives prominence to the Neural Tangent Kernel (NTK), which simplifies network training dynamics in the infinite-width limit by fixing gradients with respect to initial parameter values. While this "lazy training" regime explains learning dynamics in some scenarios, it also highlights a limitation—specifically, the lack of meaningful feature learning within hidden layers.

To address this, the lecture notes discuss feature learning via a scaling analysis that ensures meaningful evolution of network features during training. The maximal update parameterization (μP\mu P) exemplifies a designed approach to maintain non-trivial feature learning, showcasing a route beyond the NTK’s static dynamics.

Neural Networks and Field Theory

The culmination of these lectures is the proposal of the NN-FT correspondence, framing neural networks as defined field theories where parameters shape function spaces analogous to fields. These notes propose that by understanding neural networks through this lens, one can explore novel field theories or even non-traditional interactions, potentially offering new angles to approach quantum field theories.

Implications and Future Directions

The lectures postulate that improved theoretical understanding of neural networks could drive developments reminiscent of the personal computer revolution, where large models could be reduced to manageable scales without sacrificing capability. Furthermore, integrating principles of expressivity, dynamics, and architectural innovation might yield optimal learning algorithms.

The ongoing challenge is to construct such theoretical frameworks that bridge the divide between abstract mathematical results and empirical success in machine learning. This union promises advances not only in artificial intelligence but also in the foundational understanding of field theories, thus benefiting both domains. The exploration remains open to further inquiry into symmetries, computational shortcuts, and the foundational principles of learned representations.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Authors (1)