Fast training and sampling of Restricted Boltzmann Machines (2405.15376v2)
Abstract: Restricted Boltzmann Machines (RBMs) are effective tools for modeling complex systems and deriving insights from data. However, training these models with highly structured data presents significant challenges due to the slow mixing characteristics of Markov Chain Monte Carlo processes. In this study, we build upon recent theoretical advancements in RBM training, to significantly reduce the computational cost of training (in very clustered datasets), evaluating and sampling in RBMs in general. The learning process is analogous to thermodynamic continuous phase transitions observed in ferromagnetic models, where new modes in the probability measure emerge in a continuous manner. Such continuous transitions are associated with the critical slowdown effect, which adversely affects the accuracy of gradient estimates, particularly during the initial stages of training with clustered data. To mitigate this issue, we propose a pre-training phase that encodes the principal components into a low-rank RBM through a convex optimization process. This approach enables efficient static Monte Carlo sampling and accurate computation of the partition function. We exploit the continuous and smooth nature of the parameter annealing trajectory to achieve reliable and computationally efficient log-likelihood estimations, enabling online assessment during the training, and propose a novel sampling strategy named parallel trajectory tempering (PTT) which outperforms previously optimized MCMC methods. Our results show that this training strategy enables RBMs to effectively address highly structured datasets that conventional methods struggle with. We also provide evidence that our log-likelihood estimation is more accurate than traditional, more computationally intensive approaches in controlled scenarios. The PTT algorithm significantly accelerates MCMC processes compared to existing and conventional methods.
- Paul Smolensky. In Parallel Distributed Processing: Volume 1 by D. Rumelhart and J. McLelland, chapter 6: Information Processing in Dynamical Systems: Foundations of Harmony Theory. 194-281. MIT Press, 1986.
- A learning algorithm for Boltzmann machines. Cognitive science, 9(1):147–169, 1985.
- Optimal perceptual inference. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, volume 448, pages 448–453. Citeseer, 1983.
- Inverse statistical problems: from the inverse ising problem to data science. Advances in Physics, 66(3):197–261, 2017.
- Ising models for inferring network structure from spike data. 2011.
- Inverse statistical physics of protein sequences: a key issues review. Reports on Progress in Physics, 81(3):032601, 2018.
- Inferring effective couplings with restricted boltzmann machines. SciPost Physics, 16(4):095, 2024.
- Unsupervised hierarchical clustering using the learning dynamics of restricted boltzmann machines. Phys. Rev. E, 108:014110, Jul 2023.
- A theory of generative convnet. In Maria Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 2635–2644, New York, New York, USA, 20–22 Jun 2016. PMLR.
- How to train your energy-based models. arXiv preprint arXiv:2101.03288, 2021.
- Solving the quantum many-body problem with artificial neural networks. Science, 355(6325):602–606, 2017.
- Restricted boltzmann machines in quantum physics. Nature Physics, 15(9):887–892, 2019.
- Deep learning and the ads/cft correspondence. Physical Review D, 98(4):046019, 2018.
- Koji Hashimoto. Ads/cft correspondence as a deep boltzmann machine. Physical Review D, 99(10):106017, 2019.
- Learning protein constitutive motifs from sequence data. Elife, 8:e39397, 2019.
- Epistatic models predict mutable sites in sars-cov-2 proteins and epitopes. Proceedings of the National Academy of Sciences, 119(4):e2113118119, 2022.
- Creating artificial human genomes using generative neural networks. PLoS genetics, 17(2):e1009303, 2021.
- Deep convolutional and conditional neural networks for large-scale genomic data generation. PLOS Computational Biology, 19(10):e1011584, 2023.
- Learning non-convergent non-persistent short-run mcmc toward energy-based model. Advances in Neural Information Processing Systems, 32, 2019.
- On the anatomy of mcmc-based maximum likelihood learning of energy-based models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 5272–5280, 2020.
- Equilibrium and non-equilibrium regimes in the learning of restricted boltzmann machines. Advances in Neural Information Processing Systems, 34:5345–5359, 2021.
- Explaining the effects of non-convergent sampling in the training of energy-based models. arXiv preprint arXiv:2301.09428, 2023.
- Fast and functional structured data generators rooted in out-of-equilibrium physics. arXiv preprint arXiv:2307.06797, 2023.
- Gaussian-bernoulli rbms without tears. arXiv preprint arXiv:2210.10318, 2022.
- Learning a restricted boltzmann machine using biased monte carlo sampling. SciPost Physics, 14(3):032, 2023.
- Spectral dynamics of learning in restricted boltzmann machines. Europhysics Letters, 119(6):60001, nov 2017.
- Thermodynamics of restricted boltzmann machines and related learning dynamics. Journal of Statistical Physics, 172:1576–1608, 2018.
- Cascade of phase transitions in the training of energy-based models. arXiv:2405.14689, 2024.
- A tutorial on energy-based learning. Predicting structured data, 1(0), 2006.
- Geoffrey E Hinton. Training products of experts by minimizing contrastive divergence. Neural computation, 14(8):1771–1800, 2002.
- On the quantitative analysis of deep belief networks. In Proceedings of the 25th international conference on Machine learning, pages 872–879, 2008.
- Tempered markov chain monte carlo for training of restricted boltzmann machines. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 145–152. JMLR Workshop and Conference Proceedings, 2010.
- Tijmen Tieleman. Training restricted boltzmann machines using approximations to the likelihood gradient. In Proceedings of the 25th international conference on Machine learning, pages 1064–1071, 2008.
- Simulated tempering: a new monte carlo scheme. Europhysics letters, 19(6):451, 1992.
- Russ R Salakhutdinov. Learning in markov random fields using tempered transitions. Advances in neural information processing systems, 22, 2009.
- Population-contrastive-divergence: Does consistency help with rbm training? Pattern Recognition Letters, 102:1–7, 2018.
- Efficient training of energy-based models using jarzynski equality. Advances in Neural Information Processing Systems, 36, 2024.
- Mcmc should mix: learning energy-based model with neural transport latent space mcmc. In International Conference on Learning Representations (ICLR 2022)., 2022.
- Balanced training of energy-based models with adaptive flow sampling. In ICML 2023 Workshop on Structured Probabilistic Inference and Generative Modeling, 2023.
- Accelerated sampling with stacked restricted boltzmann machines. In The Twelfth International Conference on Learning Representations, 2023.
- Exact training of restricted boltzmann machines on intrinsically low dimensional data. Physical Review Letters, 127(15):158303, 2021.
- Algorithms for estimating the partition function of restricted boltzmann machines. Artificial Intelligence, 278:103195, 2020.
- Generation and evaluation of privacy preserving synthetic health data. Neurocomputing, 416:244–255, 2020.