Universal representation by Boltzmann machines with Regularised Axons (2310.14395v2)
Abstract: It is widely known that Boltzmann machines are capable of representing arbitrary probability distributions over the values of their visible neurons, given enough hidden ones. However, sampling -- and thus training -- these models can be numerically hard. Recently we proposed a regularisation of the connections of Boltzmann machines, in order to control the energy landscape of the model, paving a way for efficient sampling and training. Here we formally prove that such regularised Boltzmann machines preserve the ability to represent arbitrary distributions. This is in conjunction with controlling the number of energy local minima, thus enabling easy \emph{guided} sampling and training. Furthermore, we explicitly show that regularised Boltzmann machines can store exponentially many arbitrarily correlated visible patterns with perfect retrieval, and we connect them to the Dense Associative Memory networks.
- H. Ramsauer, B. Schafl, J. Lehner, P. Seidl, M. Widrich, L. Gruber, M. Holzleitner, M. Pavlovi’c, G. K. F. Sandve, V. Greiff, D. P. Kreil, M. Kopp, G. Klambauer, J. Brandstetter, and S. Hochreiter, “Hopfield networks is all you need,” arXiv:2008.02217 .
- J. Hopfield, Proc. Natl. Acad. Sci. U.S.A. 79, 2554 (1982).
- E. Gardner, J. Phys. A: Math. Gen. 21, 257 (1988).
- L. Younes, Appl. Math. Lett. 9, 109 (1996).
- P. Smolensky, in Parallel Distributed Processing, Volume 1: Explorations in the Microstructure of Cognition: Foundations (The MIT Press, 1986) https://direct.mit.edu/book/chapter-pdf/2163042/9780262291408_caf.pdf .
- Y. Bengio, Learning deep architectures for AI (Now Foundations and Trends, 2009).
- Y. Ichikawa and K. Hukushima, J. Phys. Soc. Jpn. 91, 114001 (2022), arXiv:2205.01272 .
- D. J. Amit, “Frontmatter,” in Modeling Brain Function: The World of Attractor Neural Networks (Cambridge University Press, 1989) pp. i–iv.
- W. Tarkowski and M. Lewenstein, Phys. Rev. A 46, 2139 (1992).
- W. Tarkowski and M. Lewenstein, J. Phys. A: Math. Gen. 26, 2453 (1993).
- R. Monasson, J. Phys. A: Math. Gen. 25, 3701 (1992).
- R. Monasson, J. Phys. I France 3, 1141 (1993).
- P. R. Grzybowski et al., in preparation .
- D. Krotov and J. J. Hopfield, in International Conference on Learning Representations (2021) arXiv:2008.06996 .
- C. Lucibello and M. Mézard, “The exponential capacity of dense associative memories,” (2023), arXiv:2304.14964 .
- N. Le Roux and Y. Bengio, Neural Comput. 20, 1631 (2008).
- G. F. Montúfar and N. Ay, Neural Comput. 23, 1306 (2011), arXiv:1005.1593 .
- K. Binder and A. P. Young, Rev. Mod. Phys. 58, 801 (1986).
- F. Barahona, J. Phys. A: Math. Gen. 15, 3241 (1982).
- G. E. Hinton and T. J. Sejnowski, in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (Washington DC, US, 1983) pp. 448–453.