Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dynamic mean field programming (2206.05200v2)

Published 10 Jun 2022 in stat.ML, cond-mat.dis-nn, and cs.LG

Abstract: A dynamic mean field theory is developed for finite state and action Bayesian reinforcement learning in the large state space limit. In an analogy with statistical physics, the BeLLMan equation is studied as a disordered dynamical system; the Markov decision process transition probabilities are interpreted as couplings and the value functions as deterministic spins that evolve dynamically. Thus, the mean-rewards and transition probabilities are considered to be quenched random variables. The theory reveals that, under certain assumptions, the state-action values are statistically independent across state-action pairs in the asymptotic state space limit, and provides the form of the distribution exactly. The results hold in the finite and discounted infinite horizon settings, for both value iteration and policy evaluation. The state-action value statistics can be computed from a set of mean field equations, which we call dynamic mean field programming (DMFP). For policy evaluation the equations are exact. For value iteration, approximate equations are obtained by appealing to extreme value theory or bounds. The result provides analytic insight into the statistical structure of tabular reinforcement learning, for example revealing the conditions under which reinforcement learning is equivalent to a set of independent multi-armed bandit problems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. T. Aven. Upper (lower) bounds on the mean of the maximum (minimum) of a number of random variables. Journal of applied probability, 22(3):723–728, 1985.
  2. Variational perturbation and extended plefka approaches to dynamics on random networks: the case of the kinetic ising model. Journal of Physics A: Mathematical and Theoretical, 49(43):434003, 2016.
  3. G. Ben Arous and A. Guionnet. Large deviations for langevin spin glass dynamics. Probability Theory and Related Fields, 102(4):455–509, 1995.
  4. Advanced mathematical methods for scientists and engineers I: Asymptotic methods and perturbation theory, volume 1. Springer Science & Business Media, 1999.
  5. Tight bounds on expected order statistics. Probability in the Engineering and Informational Sciences, 20(4):667–686, 2006.
  6. T. Castellani and A. Cavagna. Spin-glass theory for pedestrians. Journal of Statistical Mechanics: Theory and Experiment, 2005(05):P05012, may 2005. doi: 10.1088/1742-5468/2005/05/p05012. URL https://doi.org/10.1088/1742-5468/2005/05/p05012.
  7. F. Chang and T. L. Lai. Optimal stopping and dynamic allocation. Advances in Applied Probability, 19(4):829–853, 1987.
  8. J. P. Cohen. The penultimate form of approximation to normal extremes. Advances in Applied Probability, 14(2):324–339, 1982. ISSN 00018678. URL http://www.jstor.org/stable/1426524.
  9. D. Conniffe and J. E. Spencer. When moments of ratios are ratios of moments. Journal of the Royal Statistical Society. Series D (The Statistician), 50(2):161–168, 2001. ISSN 00390526, 14679884. URL http://www.jstor.org/stable/2681091.
  10. A. Crisanti and H. Sompolinsky. Dynamics of spin systems with randomly asymmetric bonds: Langevin dynamics and a spherical model. Physical Review A, 36(10):4922, 1987.
  11. A. Crisanti and H. Sompolinsky. Dynamics of spin systems with randomly asymmetric bonds: Ising spins and glauber dynamics. Physical Review A, 37(12):4865, 1988.
  12. A. Crisanti and H. Sompolinsky. Path integral approach to random neural networks. Physical Review E, 98(6):062120, 2018.
  13. H. E. Daniels. Saddlepoint approximations in statistics. The Annals of Mathematical Statistics, pages 631–650, 1954.
  14. Bayesian q-learning. Aaai/iaai, 1998:761–768, 1998.
  15. M. O. Duff. Optimal Learning: Computational procedures for Bayes-adaptive Markov decision processes. University of Massachusetts Amherst, 2002.
  16. D. Dürr and A. Bach. The onsager-machlup function as lagrangian for the most probable path of a diffusion process. Communications in Mathematical Physics, 60:153–170, 1978.
  17. G. L. Eyink. Action principle in nonequilibrium statistical dynamics. Physical Review E, 54(4):3419, 1996.
  18. Laws of small numbers: extremes and rare events. Springer Science & Business Media, 2010.
  19. Limiting forms of the frequency distribution of the largest or smallest member of a sample. Mathematical Proceedings of the Cambridge Philosophical Society, 24(2):180–190, 1928. doi: 10.1017/S0305004100015681.
  20. Rigorous dynamical mean field theory for stochastic gradient descent methods. arXiv preprint arXiv:2210.06591, 2022.
  21. R. Graham. Path integral formulation of general diffusion processes. Zeitschrift für Physik B Condensed Matter, 26(3):281–290, 1977.
  22. K. Grosvenor and R. Jefferson. The edge of chaos: quantum field theory and deep neural networks. SciPost Physics, 12(3):081, 2022.
  23. B. Hanin and M. Nica. Products of many large random matrices and gradients in deep neural networks. Communications in Mathematical Physics, 376(1):287–322, 2020.
  24. M. Helias and D. Dahmen. Statistical field theory for neural networks, 2019.
  25. N. L. Hjort and A. Ongaro. Exact inference for random dirichlet means. Statistical Inference for Stochastic Processes, 8(3):227–254, 2005.
  26. J. Hüsler. Extremes: Limit results for univariate and multivariate nonstationary sequences. In Extreme Value Theory and Applications, pages 283–304. Springer, 1994.
  27. P. Jacko. The finite-horizon two-armed bandit problem with binary responses: A multidisciplinary survey of the history, state of the art, and myths. arXiv preprint arXiv:1906.10173, 2019.
  28. On bayesian upper confidence bounds for bandit problems. In Artificial intelligence and statistics, pages 592–600. PMLR, 2012.
  29. Local weak convergence for sparse networks of interacting processes. The Annals of Applied Probability, 33(2):843–888, 2023.
  30. L. Le Cam. Un théorème sur la division d’un intervalle par des points pris au hasard. 1958.
  31. J. MacLaurin. Large deviations of non-stochastic interacting particles on sparse random graphs. arXiv preprint arXiv:2010.14421, 2020.
  32. Spin glass theory and beyond: An Introduction to the Replica Method and Its Applications, volume 9. World Scientific Publishing Company, 1987.
  33. O. Moynot and M. Samuelides. Large deviations and mean-field theory for asymmetric random recurrent neural networks. Probability Theory and Related Fields, 123(1):41–75, 2002.
  34. G. Neu and C. Pike-Burke. A unifying view of optimism in episodic reinforcement learning. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 1392–1403. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/file/0f0e13216262f4a201bec128044dd30f-Paper.pdf.
  35. B. O’Donoghue. Variational bayesian reinforcement learning with regret bounds. CoRR, abs/1807.09647, 2018. URL http://arxiv.org/abs/1807.09647.
  36. The uncertainty Bellman equation and exploration. In J. Dy and A. Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 3839–3848. PMLR, 10–15 Jul 2018. URL https://proceedings.mlr.press/v80/odonoghue18a.html.
  37. Making sense of reinforcement learning and probabilistic inference. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=S1xitgHtvS.
  38. Interacting diffusions on sparse graphs: hydrodynamics from local weak limits. 2020.
  39. M. Opper and D. Saad. From Naive Mean Field Theory to the TAP Equations, pages 7–20. 2001.
  40. I. Osband and B. Van Roy. Why is posterior sampling better than optimism for reinforcement learning? In Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, page 2701–2710. JMLR.org, 2017.
  41. J. Pitman. Random weighted averages, partition structures and generalized arcsine laws. arXiv preprint arXiv:1804.07896, 2018.
  42. Exponential expressivity in deep neural networks through transient chaos. Advances in neural information processing systems, 29:3360–3368, 2016.
  43. An analytic solution to discrete bayesian reinforcement learning. In Proceedings of the 23rd international conference on Machine learning, pages 697–704, 2006.
  44. R. Pyke. Spacings. Journal of the Royal Statistical Society: Series B (Methodological), 27(3):395–436, 1965.
  45. D. Russo. A note on the equivalence of upper confidence bounds and gittins indices for patient agents. Operations Research, 69(1):273–278, 2021.
  46. Deep information propagation. URL https://openreview. net/pdf, 2017.
  47. Unified field theoretical approach to deep and recurrent neuronal networks. Journal of Statistical Mechanics: Theory and Experiment, 2022(10):103401, 2022.
  48. H. Sompolinsky and A. Zippelius. Relaxational dynamics of the edwards-anderson model and the mean-field theory of spin-glasses. Physical Review B, 25(11):6860, 1982.
  49. Chaos in random neural networks. Physical review letters, 61(3):259, 1988.
  50. Variance-based rewards for approximate bayesian reinforcement learning. In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, UAI’10, page 564–571, Arlington, Virginia, USA, 2010. AUAI Press. ISBN 9780974903965.
  51. Self-consistent formulations for stochastic nonlinear neuronal dynamics. Physical Review E, 101(4):042124, 2020.
  52. Dynamical mean field theory of spin glasses and their phase transitions in ac external fields. Progress of Theoretical Physics Supplement, 87:191–213, 1986.
  53. T. Tao. Topics in random matrix theory, volume 132. American Mathematical Soc., 2012.
  54. H. Touchette. The large deviation approach to statistical mechanics. Physics Reports, 478(1-3):1–69, Jul 2009. ISSN 0370-1573. doi: 10.1016/j.physrep.2009.05.002. URL http://dx.doi.org/10.1016/j.physrep.2009.05.002.
  55. H. Touchette and R. J. Harris. Large deviation approach to nonequilibrium systems. Nonequilibrium Statistical Physics of Small Systems, page 335–360, Feb 2013. doi: 10.1002/9783527658701.ch11. URL http://dx.doi.org/10.1002/9783527658701.ch11.
  56. J. von Neumann. Distribution of the Ratio of the Mean Square Successive Difference to the Variance. The Annals of Mathematical Statistics, 12(4):367 – 395, 1941. doi: 10.1214/aoms/1177731677. URL https://doi.org/10.1214/aoms/1177731677.
  57. Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning, 1(1–2):1–305, 2008.

Summary

We haven't generated a summary for this paper yet.