Mean-Field Assisted Deep Boltzmann Learning with Probabilistic Computers (2401.01996v1)
Abstract: Despite their appeal as physics-inspired, energy-based and generative nature, general Boltzmann Machines (BM) are considered intractable to train. This belief led to simplified models of BMs with restricted intralayer connections or layer-by-layer training of deep BMs. Recent developments in domain-specific hardware -- specifically probabilistic computers (p-computer) with probabilistic bits (p-bit) -- may change established wisdom on the tractability of deep BMs. In this paper, we show that deep and unrestricted BMs can be trained using p-computers generating hundreds of billions of Markov Chain Monte Carlo (MCMC) samples per second, on sparse networks developed originally for use in D-Wave's annealers. To maximize the efficiency of learning the p-computer, we introduce two families of Mean-Field Theory assisted learning algorithms, or xMFTs (x = Naive and Hierarchical). The xMFTs are used to estimate the averages and correlations during the positive phase of the contrastive divergence (CD) algorithm and our custom-designed p-computer is used to estimate the averages and correlations in the negative phase. A custom Field-Programmable-Gate Array (FPGA) emulation of the p-computer architecture takes up to 45 billion flips per second, allowing the implementation of CD-$n$ where $n$ can be of the order of millions, unlike RBMs where $n$ is typically 1 or 2. Experiments on the full MNIST dataset with the combined algorithm show that the positive phase can be efficiently computed by xMFTs without much degradation when the negative phase is computed by the p-computer. Our algorithm can be used in other scalable Ising machines and its variants can be used to train BMs, previously thought to be intractable.
- A learning algorithm for Boltzmann machines, 1985.
- Deep boltzmann machines. In David van Dyk and Max Welling, editors, Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, volume 5 of Proceedings of Machine Learning Research, pages 448–455, Hilton Clearwater Beach Resort, Clearwater Beach, Florida USA, 16–18 Apr 2009. PMLR.
- Machine learning and the physical sciences. Reviews of Modern Physics, 91(4):045002, 2019.
- Stochastic p-bits for invertible logic. Physical Review X, 7(3):031014, 2017a.
- A full-stack view of probabilistic computing with p-bits: devices, architectures and algorithms. IEEE Journal on Exploratory Solid-State Computational Devices and Circuits, 2023.
- Integer factorization using stochastic magnetic tunnel junctions. Nature, 2019.
- Energy-efficient superparamagnetic Ising machine and its application to traveling salesman problems. arXiv preprint arXiv:2306.11572, 2023.
- Massively parallel probabilistic computing with sparse Ising machines. Nature Electronics, 5(7):460–468, 2022.
- Autonomous probabilistic coprocessing with petaflips per second. IEEE Access, 8:157238–157252, 2020.
- 45nm low power cmos logic compatible embedded stt mram utilizing a reverse-connection 1t/1mtj cell. In Electron Devices Meeting (IEDM), 2009 IEEE International, pages 1–4. IEEE, 2009.
- CMOS+ stochastic nanomagnets: heterogeneous computers for probabilistic inference and learning. arXiv preprint arXiv:2304.05949, 2023.
- Application of quantum annealing to training of deep neural networks. arXiv preprint arXiv:1510.06356, 2015.
- Accelerating deep learning with memcomputing. Neural Networks, 110:1–7, 2019.
- Training Restricted Boltzmann Machines With a D-Wave Quantum Annealer. Frontiers in Physics, 9:589626, 2021.
- Noise-injected analog Ising machines enable ultrafast statistical sampling and machine learning. Nature Communications, 13(1):5847, 2022.
- Geoffrey E Hinton. Training products of experts by minimizing contrastive divergence. Neural computation, 14(8):1771–1800, 2002.
- An empirical evaluation of deep architectures on problems with many factors of variation. In Proceedings of the 24th international conference on Machine learning, pages 473–480, 2007.
- Training Deep Boltzmann Networks with Sparse Ising Machines. arXiv preprint arXiv:2303.10728, 2023.
- Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning. The MIT Press, 2009. ISBN 0262013193.
- Stochastic p-bits for invertible logic. Physical Review X, 7(3):031014, 2017b.
- Mean-field theory, page 144–212. Cambridge University Press, 1995. doi: 10.1017/CBO9780511813467.005.
- Mehran Kardar. Statistical Physics of Particles. Cambridge University Press, 2007. doi: 10.1017/CBO9780511815898.
- Dalton A R Sakthivadivel. Magnetisation and Mean Field Theory in the Ising Model. SciPost Phys. Lect. Notes, page 35, 2022. doi: 10.21468/SciPostPhysLectNotes.35.
- A mean field theory learning algorithm for neural networks. Complex Systems, 1:995–1019, 1987.
- Hilbert Kappen and Francisco de Borja Rodríguez Ortiz. Boltzmann machine learning using mean field theory and linear response correction. In M. Jordan, M. Kearns, and S. Solla, editors, Advances in Neural Information Processing Systems, volume 10. MIT Press, 1997.
- Mean field approach to learning in Boltzmann Machines. Pattern Recognition Letters, 18(11):1317–1322, 1997. ISSN 0167-8655. doi: https://doi.org/10.1016/S0167-8655(97)00096-2.
- Efficient learning in Boltzmann machines using linear response theory. Neural Computation, 10(5):1137–1156, 1998.
- Toshiyuki Tanaka. Mean-field theory of Boltzmann machine learning. Phys. Rev. E, 58:2302–2310, Aug 1998. doi: 10.1103/PhysRevE.58.2302.
- A New Learning Algorithm for Mean Field Boltzmann Machines. In José R. Dorronsoro, editor, Artificial Neural Networks — ICANN 2002, pages 351–357, Berlin, Heidelberg, 2002. Springer Berlin Heidelberg. ISBN 978-3-540-46084-8.
- Haiping Huang. Variational mean-field theory for training restricted Boltzmann machines with binary synapses. Phys. Rev. E, 102:030301, Sep 2020. doi: 10.1103/PhysRevE.102.030301.
- Information, physics, and computation. Oxford University Press, 2009.
- Terrence J Sejnowski. Higher-order Boltzmann machines. In AIP Conference Proceedings, volume 151, pages 398–403. American Institute of Physics, 1986.
- Yann LeCun. The MNIST database of handwritten digits. http://yann. lecun. com/exdb/mnist/, 1998.
- Pegasus: The second connectivity graph for large-scale quantum annealing hardware. arXiv preprint arXiv:1901.07636, 2019.
- Geoffrey E. Hinton. A Practical Guide to Training Restricted Boltzmann Machines, pages 599–619. Springer Berlin Heidelberg, Berlin, Heidelberg, 2012.
- Tijmen Tieleman. Training restricted Boltzmann machines using approximations to the likelihood gradient. In Proceedings of the 25th international conference on Machine learning, pages 1064–1071, 2008.