Thermodynamic Overfitting and Generalization: Energetic Limits on Predictive Complexity (2402.16995v1)
Abstract: Efficiently harvesting thermodynamic resources requires a precise understanding of their structure. This becomes explicit through the lens of information engines -- thermodynamic engines that use information as fuel. Maximizing the work harvested using available information is a form of physically-instantiated machine learning that drives information engines to develop complex predictive memory to store an environment's temporal correlations. We show that an information engine's complex predictive memory poses both energetic benefits and risks. While increasing memory facilitates detection of hidden patterns in an environment, it also opens the possibility of thermodynamic overfitting, where the engine dissipates additional energy in testing. To address overfitting, we introduce thermodynamic regularizers that incur a cost to engine complexity in training due to the physical constraints on the information engine. We demonstrate that regularized thermodynamic machine learning generalizes effectively. In particular, the physical constraints from which regularizers are derived improve the performance of learned predictive models. This suggests that the laws of physics jointly create the conditions for emergent complexity and predictive intelligence.
- “Introduction to artificial neural network”. International Journal of Engineering and Innovative Technology (IJEIT) 2, 189–194 (2012). url: https://api.semanticscholar.org/CorpusID:212457035.
- “Deep unsupervised learning using nonequilibrium thermodynamics”. In Proceedings of the 32nd International Conference on Machine Learning. Volume 37, pages 2256–2265. PMLR (2015). url: https://proceedings.mlr.press/v37/sohl-dickstein15.html.
- “Statistical mechanics of deep learning”. Annual Review of Condensed Matter Physics 11, 501–528 (2020).
- “Thermodynamic computing system for ai applications” (2023). arXiv:2312.04836.
- “Thermodynamic ai and the fluctuation frontier”. In 2023 IEEE International Conference on Rebooting Computing (ICRC). Pages 1–10. IEEE (2023).
- Karl Friston. “The free-energy principle : a unified brain theory?”. Nature Reviews Neuroscience 11, 127–138 (2010).
- “Predictive coding is a consequence of energy efficiency in recurrent neural networks”. Patterns 3, 100661 (2022).
- “Embodied intelligence via learning and evolution”. Nature communications 12, 5721 (2021).
- Christopher Bishop. “Pattern Recognition and Machine Learning”. Springer. (2006). url: https://www.microsoft.com/en-us/research/publication/pattern-recognition-machine-learning/.
- “Thermodynamic machine learning through maximum work production”. New Journal of Physics 24, 083040 (2021).
- “Thermodynamics of Prediction”. Physical Review Letters 109, 120604 (2012).
- J. P. Crutchfield and K. Young. “Inferring statistical complexity”. Physical Review Letters 63, 105–108 (1989).
- “Regularities unseen, randomness observed: Levels of entropy convergence”. CHAOS 13, 25–54 (2003).
- J. P. Crutchfield. “Between order and chaos”. Nature Physics 8, 17–24 (2012).
- “Self-organized novelty detection in driven spin glasses” (2019).
- Xue Ying. “An overview of overfitting and its solutions”. Journal of physics: Conference series 1168, 022022 (2019).
- “Thermodynamics of modularity: Structural costs beyond the landauer bound”. Physical Review X 8, 031036 (2018).
- “Deep double descent: Where bigger models and more data hurt”. Journal of Statistical Mechanics: Theory and Experiment 2021, 124003 (2021).
- Robert Tibshirani. “Regression shrinkage and selection via the lasso”. Journal of the Royal Statistical Society Series B: Statistical Methodology 58, 267–288 (1996).
- “Regularization parameter selections via generalized information criterion”. Journal of the American statistical Association 105, 312–323 (2010).
- “Recent advances in physical reservoir computing: A review”. Neural Networks 115, 100–123 (2019).
- “Reviving and improving recurrent back-propagation”. In International Conference on Machine Learning. Volume 80, pages 3082–3091. PMLR (2018). url: https://proceedings.mlr.press/v80/liao18c.html.
- “Attention is all you need”. Advances in neural information processing systems 30, 5998–6008 (2017). url: https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
- J. C. Maxwell. “Theory of heat”. Longmans, Green and Co. London, United Kingdom (1871).
- W. Thomson. “Kinetic theory of the dissipation of energy”. Nature 9, 441–444 (1874).
- Harvey Leff and Andrew F. Rex, editors. “Maxwell’s Demon 2: Entropy, Classical and Quantum Information, Computing”. CRC Press. (2002).
- Leo Szilard. “Über die entropieverminderung in einem thermodynamischen system bei eingriffen intelligenter wesen”. Zeitschrift für Physik 53, 840–856 (1929).
- R. Landauer. “Irreversibility and heat generation in the computing process”. IBM J. Res. Develop. 5, 183–191 (1961).
- “Thermodynamics of information”. Nature Physics 11, 131–139 (2015).
- “Entropy production as correlation between system and reservoir”. New Journal of Physics 12, 013013 (2010).
- “An improved Landauer principle with finite-size corrections”. New Journal of Physics 16, 103011 (2014).
- “Landauer Versus Nernst: What is the True Cost of Cooling a Quantum System”. PRX Quantum 4, 010332 (2023).
- G. E. Crooks. “Entropy production fluctuation theorem and the nonequilibrium work relation for free energy differences”. Physical Review E 60, 2721 (1999).
- Massimiliano Esposito and Christian Van den Broeck. “Second law and landauer principle far from equilibrium”. Europhysics Letters 95, 40004 (2011).
- Udo Seifert. “Entropy production along a stochastic trajectory and an integral fluctuation theorem”. Physical Review Letters 95, 040602 (2005).
- “Thermodynamics of complexity and pattern manipulation”. Physical Review E 95, 042140 (2017).
- “Optimal work extraction and the minimum description length principle”. Journal of Statistical Mechanics: Theory and Experiment 2020, 093403 (2020).
- D. Mandal and C. Jarzynski. “Work and information processing in a solvable model of Maxwell’s demon”. Proc. Natl. Acad. Sci. USA 109, 11641–11645 (2012).
- “Identifying functional thermodynamics in autonomous Maxwellian ratchets”. New Journal Physics 18, 023049 (2016).
- “Thermodynamics of stochastic turing machines”. Physical Review E 92, 042104 (2015).
- “Why does deep and cheap learning work so well?”. Journal of Statistical Physics 168, 1223–1247 (2017).
- Daniel Ray Upper. “Theory and algorithms for hidden markov models and generalized hidden markov models”. University of California, Berkeley. (1997).
- “Bayesian structural inference for hidden processes”. Physical Review E 89, 042119 (2014).
- “Expressiveness and learning of hidden quantum markov models”. In International Conference on Artificial Intelligence and Statistics. Pages 4151–4161. PMLR (2020). url: https://proceedings.mlr.press/v108/adhikary20a.html.
- “Transformers are universal predictors” (2023). arXiv:2307.07843.
- “Model selection and model averaging”. Volume 330. Cambridge University Press Cambridge. (2008).
- “Temporal correlations in the simplest measurement sequences”. Quantum 6, 623 (2022).
- “Shannon entropy rate of hidden markov processes”. Journal of Statistical Physics 183, 32 (2021).
- A. Kolchinsky and D. H. Wolpert. “Dependence of dissipation on the initial distribution over states”. J. Stat. Mech.: Th. Expt. 2017, 083202 (2017).
- P. M. Riechers and M. Gu. “Initial-state dependence of thermodynamic dissipation for any quantum process”. Physical Review EPage 042145 (2020). arXiv:2002.11425.
- G. J. Milburn. “Quantum learning machines” (2023). arXiv:2305.07801.
- “Optimal causal inference: Estimating stored information and approximating causal architecture”. CHAOS 20, 037111 (2010).
- “Past-future information bottleneck in dynamical systems”. Physical Review E 79, 041925 (2009).
- “Predictive rate-distortion for infinite-order markov processes”. Journal of Statistical Physics 163, 1312–1338 (2016).
- “The elements of statistical learning: data mining, inference, and prediction”. Volume 2. Springer. (2009).
- “Dropout: a simple way to prevent neural networks from overfitting”. The journal of machine learning research 15, 1929–1958 (2014). url: http://jmlr.org/papers/v15/srivastava14a.html.
- “Correlation-powered information engines and the thermodynamics of self-correction”. Physical Review E 95, 012152 (2017).
- Edwin T Jaynes. “Probability theory: The logic of science”. Cambridge university press. (2003).
- Peter D Grünwald. “The minimum description length principle”. MIT press. (2007).
- “High-precision test of Landauer’s principle in a feedback trap”. Physical Review Letters 113, 190601 (2014).
- “Transient dissipation and structural costs of physical information transduction”. Physical Review Letters 118, 220602 (2017).
- “Leveraging environmental correlations: The thermodynamics of requisite variety”. Journal of Statistical Physics 167, 1555–1585 (2016).
- “Fisher information of correlated stochastic processes”. New Journal of Physics 25, 053037 (2023).
- Paul M Riechers. “Ultimate limit on learning non-markovian behavior: Fisher information rate and excess information” (2023). arXiv:2310.03968.
- “Theory of point estimation”. Springer Science & Business Media. (2006).
- “Strange properties of linear reservoirs in the infinitely large limit for prediction of continuous-time signals”. Journal of Statistical Physics 190, 32 (2023).
- Thomas L Carroll. “Do reservoir computers work best at the edge of chaos?”. Chaos: An Interdisciplinary Journal of Nonlinear Science 30, 121109 (2020).
- “Edge of chaos computation in mixed-mode vlsi-a hard liquid”. Advances in neural information processing systems17 (2004). url: proceedings.neurips.cc/paper_files/paper/2004/file/dbab2adc8f9d078009ee3fa810bea142-Paper.pdf.
- Sarah Marzen. “Infinitely large, randomly wired sensors cannot predict their input unless they are close to deterministic”. Plos one 13, e0202333 (2018).
- “On the difficulty of training recurrent neural networks”. In Sanjoy Dasgupta and David McAllester, editors, Proceedings of the 30th International Conference on Machine Learning. Volume 28 of Proceedings of Machine Learning Research, pages 1310–1318. PMLR (2013).
- “Learning causal state representations of partially observable environments” (2019). arXiv:1906.10437.
- “Code prediction by feeding trees to transformers”. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). Pages 150–162. IEEE (2021).
- “What context features can transformer language models use?” (2021). arXiv:2106.08367.
- “Quantum mechanics can reduce the complexity of classical models”. Nature communications 3, 762 (2012).
- “Learning temporal data with a variational quantum recurrent neural network”. Physical Review A 103, 052414 (2021).
- “Initial-state dependence of thermodynamic dissipation for any quantum process”. Phys. Rev. E 103, 042145 (2021).
- “Practical unitary simulator for non-markovian complex processes”. Physical review letters 120, 240502 (2018).
- “Quantum adaptive agents with efficient long-term memories”. Physical Review X 12, 011007 (2022).
- “Efficient discrete feature encoding for variational quantum classifier”. IEEE Transactions on Quantum Engineering 2, 1–14 (2021).
- “Strong and weak optimizations in classical and quantum models of stochastic processes”. Journal of Statistical Physics 176, 1317–1342 (2019).
- “Thermal efficiency of quantum memory compression”. Physical review letters 125, 020601 (2020).
- “Generalization in quantum machine learning: A quantum information standpoint”. PRX Quantum 2, 040321 (2021).
- “Prediction, retrodiction, and the amount of information stored in the present”. Journal of Statistical Physics 136, 1005–1034 (2009).
- “Balancing error and dissipation in computing”. Physical Review Research 2, 033524 (2020).
- “Trajectory class fluctuation theorem” (2022). arXiv:2207.03612.
- “Elements of information theory”. Wiley-Interscience. New York (2006). Second edition.
- “Not all fluctuations are created equal: Spontaneous variations in thermodynamic function” (2016). arXiv:1609.02519.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.