Dense Hopfield Networks in the Teacher-Student Setting (2401.04191v2)
Abstract: Dense Hopfield networks are known for their feature to prototype transition and adversarial robustness. However, previous theoretical studies have been mostly concerned with their storage capacity. We bridge this gap by studying the phase diagram of p-body Hopfield networks in the teacher-student setting of an unsupervised learning problem, uncovering ferromagnetic phases reminiscent of the prototype and feature learning regimes. On the Nishimori line, we find the critical size of the training set necessary for efficient pattern retrieval. Interestingly, we find that that the paramagnetic to ferromagnetic transition of the teacher-student setting coincides with the paramagnetic to spin-glass transition of the direct model, i.e. with random patterns. Outside of the Nishimori line, we investigate the learning performance in relation to the inference temperature and dataset noise. Moreover, we show that using a larger p for the student than the teacher gives the student an extensive tolerance to noise. We then derive a closed-form expression measuring the adversarial robustness of such a student at zero temperature, corroborating the positive correlation between number of parameters and robustness observed in large neural networks. We also use our model to clarify why the prototype phase of modern Hopfield networks is adversarially robust.
- J. J. Hopfield, “Neural networks and physical systems with emergent collective computational abilities.” Proceedings of the national academy of sciences, vol. 79, no. 8, pp. 2554–2558, 1982.
- D. J. Amit, H. Gutfreund, and H. Sompolinsky, “Spin-glass models of neural networks,” Physical Review A, vol. 32, no. 2, p. 1007, 1985.
- T. M. Cover, “Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition,” IEEE transactions on electronic computers, no. 3, pp. 326–334, 1965.
- D. J. Amit, H. Gutfreund, and H. Sompolinsky, “Storing infinite numbers of patterns in a spin-glass model of neural networks,” Physical Review Letters, vol. 55, no. 14, p. 1530, 1985.
- E. Agliari, A. Barra, A. Galluzzi, F. Guerra, and F. Moauro, “Multitasking associative networks,” Physical review letters, vol. 109, no. 26, p. 268101, 2012.
- E. Agliari, A. Annibale, A. Barra, A. Coolen, and D. Tantari, “Immune networks: multi-tasking capabilities at medium load,” Journal of Physics A: Mathematical and Theoretical, vol. 46, no. 33, p. 335101, 2013.
- ——, “Immune networks: multitasking capabilities near saturation,” Journal of Physics A: Mathematical and Theoretical, vol. 46, no. 41, p. 415003, 2013.
- P. Sollich, D. Tantari, A. Annibale, and A. Barra, “Extensive parallel processing on scale-free networks,” Physical review letters, vol. 113, no. 23, p. 238106, 2014.
- E. Agliari, A. Annibale, A. Barra, A. C. Coolen, and D. Tantari, “Retrieving infinite numbers of patterns in a spin-glass model of immune networks,” Europhysics Letters, vol. 117, no. 2, p. 28003, 2017.
- E. Agliari, A. Barra, A. Galluzzi, F. Guerra, D. Tantari, and F. Tavani, “Retrieval capabilities of hierarchical networks: From dyson to hopfield,” Physical review letters, vol. 114, no. 2, p. 028103, 2015.
- E. Agliari, D. Migliozzi, and D. Tantari, “Non-convex multi-species hopfield models,” Journal of Statistical Physics, vol. 172, no. 5, pp. 1247–1269, 2018.
- E. Agliari, A. Barra, A. Galluzzi, F. Guerra, D. Tantari, and F. Tavani, “Hierarchical neural networks perform both serial and parallel processing,” Neural Networks, vol. 66, pp. 22–35, 2015.
- ——, “Metastable states in the hierarchical dyson model drive parallel processing in the hierarchical hopfield network,” Journal of Physics A: Mathematical and Theoretical, vol. 48, no. 1, p. 015001, 2014.
- ——, “Topological properties of hierarchical networks,” Physical Review E, vol. 91, no. 6, p. 062807, 2015.
- A. Barra, G. Genovese, P. Sollich, and D. Tantari, “Phase transitions in restricted boltzmann machines with generic priors,” Physical Review E, vol. 96, no. 4, p. 042156, 2017.
- ——, “Phase diagram of restricted boltzmann machines and generalized hopfield networks with arbitrary priors,” Physical Review E, vol. 97, no. 2, p. 022310, 2018.
- A. Barra, P. Contucci, E. Mingione, and D. Tantari, “Multi-species mean field spin glasses. rigorous results,” in Annales Henri Poincaré, vol. 16. Springer, 2015, pp. 691–708.
- E. Agliari, A. Barra, C. Longo, and D. Tantari, “Neural networks retrieving boolean patterns in a sea of gaussian ones,” Journal of Statistical Physics, vol. 168, pp. 1085–1104, 2017.
- A. Barra, G. Genovese, F. Guerra, and D. Tantari, “How glassy are neural networks?” Journal of Statistical Mechanics: Theory and Experiment, vol. 2012, no. 07, p. P07009, 2012.
- G. Genovese and D. Tantari, “Legendre equivalences of spherical boltzmann machines,” Journal of Physics A: Mathematical and Theoretical, vol. 53, no. 9, p. 094001, 2020.
- J. Rocchi, D. Saad, and D. Tantari, “High storage capacity in the hopfield model with auto-interactions—stability analysis,” Journal of Physics A: Mathematical and Theoretical, vol. 50, no. 46, p. 465001, 2017.
- H. Ramsauer, B. Schäfl, J. Lehner, P. Seidl, M. Widrich, T. Adler, L. Gruber, M. Holzleitner, M. Pavlović, G. K. Sandve et al., “Hopfield networks is all you need,” arXiv preprint arXiv:2008.02217, 2020.
- M. Widrich, B. Schäfl, M. Pavlović, H. Ramsauer, L. Gruber, M. Holzleitner, J. Brandstetter, G. K. Sandve, V. Greiff, S. Hochreiter et al., “Modern hopfield networks and attention for immune repertoire classification,” Advances in Neural Information Processing Systems, vol. 33, pp. 18 832–18 845, 2020.
- D. Krotov and J. Hopfield, “Large associative memory problem in neurobiology and machine learning,” arXiv preprint arXiv:2008.06996, 2020.
- D. Krotov and J. J. Hopfield, “Dense associative memory for pattern recognition,” Advances in neural information processing systems, vol. 29, 2016.
- H. Chen, Y. Lee, G. Sun, H. Lee, T. Maxwell, and C. L. Giles, “High order correlation model for associative memory,” in AIP Conference Proceedings, vol. 151, no. 1. American Institute of Physics, 1986, pp. 86–99.
- D. Psaltis and C. H. Park, “Nonlinear discriminant functions and associative memories,” in AIP conference Proceedings, vol. 151, no. 1. American Institute of Physics, 1986, pp. 370–375.
- P. Baldi and S. S. Venkatesh, “Number of stable points for spin-glasses and neural networks of higher orders,” Physical Review Letters, vol. 58, no. 9, p. 913, 1987.
- E. Gardner, “Multiconnected neural network models,” Journal of Physics A: Mathematical and General, vol. 20, no. 11, p. 3453, 1987.
- L. F. Abbott and Y. Arian, “Storage capacity of generalized networks,” Physical review A, vol. 36, no. 10, p. 5091, 1987.
- D. Horn and M. Usher, “Capacities of multiconnected memory models,” Journal de Physique, vol. 49, no. 3, pp. 389–395, 1988.
- L. Albanese, F. Alemanno, A. Alessandrelli, and A. Barra, “Replica symmetry breaking in dense hebbian neural networks,” Journal of Statistical Physics, vol. 189, no. 2, p. 24, 2022.
- B. Hoover, Y. Liang, B. Pham, R. Panda, H. Strobelt, D. H. Chau, M. J. Zaki, and D. Krotov, “Energy transformer,” arXiv preprint arXiv:2302.07253, 2023.
- B. Hoover, H. Strobelt, D. Krotov, J. Hoffman, Z. Kira, and D. H. Chau, “Memory in plain sight: A survey of the uncanny resemblances between diffusion models and associative memories,” arXiv preprint arXiv:2309.16750, 2023.
- L. Ambrogioni, “In search of dispersed memories: Generative diffusion models are associative memory networks,” arXiv preprint arXiv:2309.17290, 2023.
- D. Krotov and J. Hopfield, “Dense associative memory is robust to adversarial inputs,” Neural computation, vol. 30, no. 12, pp. 3151–3167, 2018.
- F. Alemanno, L. Camanzi, G. Manzan, and D. Tantari, “Hopfield model with planted patterns: A teacher-student self-supervised learning model,” Applied Mathematics and Computation, vol. 458, p. 128253, 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0096300323004228
- A. Decelle, S. Hwang, J. Rocchi, and D. Tantari, “Inverse problems for structured datasets using parallel tap equations and restricted boltzmann machines,” Scientific Reports, vol. 11, no. 1, p. 19990, 2021.
- B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Šrndić, P. Laskov, G. Giacinto, and F. Roli, “Evasion attacks against machine learning at test time,” in Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2013, Prague, Czech Republic, September 23-27, 2013, Proceedings, Part III 13. Springer, 2013, pp. 387–402.
- C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” arXiv preprint arXiv:1312.6199, 2013.
- I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” stat, vol. 1050, p. 20, 2015.
- A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” in International Conference on Learning Representations, 2018.
- A. Muhammad and S.-H. Bae, “A survey on efficient methods for adversarial robustness,” IEEE Access, vol. 10, pp. 118 815–118 830, 2022.
- H. Nishimori, “Exact results and critical properties of the ising model with competing interactions,” Journal of Physics C: Solid State Physics, vol. 13, no. 21, p. 4071, 1980.
- P. Contucci, C. Giardina, and H. Nishimori, “Spin glass identities and the nishimori line,” in Spin Glasses: Statics and Dynamics: Summer School, Paris 2007. Springer, 2009, pp. 103–121.
- Y. Iba, “The nishimori line and bayesian statistics,” Journal of Physics A: Mathematical and General, vol. 32, no. 21, p. 3875, 1999.
- P. Charbonneau, “From the replica trick to the replica symmetry breaking technique,” arXiv preprint arXiv:2211.01802, 2022.
- D. Sherrington and S. Kirkpatrick, “Solvable model of a spin-glass,” Physical review letters, vol. 35, no. 26, p. 1792, 1975.
- G. G. Roussas, “Contiguity of probability measures,” (No Title), 1972.
- D. Achlioptas and A. Coja-Oghlan, “Algorithmic barriers from phase transitions,” in 2008 49th Annual IEEE Symposium on Foundations of Computer Science. IEEE, 2008, pp. 793–802.
- F. Krzakala and L. Zdeborová, “Hiding quiet solutions in random constraint satisfaction problems,” Physical review letters, vol. 102, no. 23, p. 238701, 2009.
- L. Zdeborová and F. Krzakala, “Quiet planting in the locked constraint satisfaction problems,” SIAM Journal on Discrete Mathematics, vol. 25, no. 2, pp. 750–770, 2011.
- ——, “Statistical physics of inference: Thresholds and algorithms,” Advances in Physics, vol. 65, no. 5, pp. 453–552, 2016.
- F. Antenucci, S. Franz, P. Urbani, and L. Zdeborová, “Glassy nature of the hard phase in inference problems,” Physical Review X, vol. 9, no. 1, p. 011020, 2019.
- L. Zdeborová and F. Krzakala, “Phase transitions in the coloring of random graphs,” Physical Review E, vol. 76, no. 3, p. 031131, 2007.
- E. Agliari, F. Alemanno, A. Barra, M. Centonze, and A. Fachechi, “Neural networks with a redundant representation: Detecting the undetectable,” Physical review letters, vol. 124, no. 2, p. 028301, 2020.
- E. Agliari and G. De Marzo, “Tolerance versus synaptic noise in dense associative memories,” The European Physical Journal Plus, vol. 135, no. 11, pp. 1–22, 2020.
- S. Gowal, C. Qin, J. Uesato, T. Mann, and P. Kohli, “Uncovering the limits of adversarial training against norm-bounded adversarial examples,” arXiv preprint arXiv:2010.03593, 2020.
- H. Huang, Y. Wang, S. Erfani, Q. Gu, J. Bailey, and X. Ma, “Exploring architectural ingredients of adversarially robust deep neural networks,” Advances in Neural Information Processing Systems, vol. 34, pp. 5545–5559, 2021.
- S. Bubeck, Y. Li, and D. M. Nagaraj, “A law of robustness for two-layers neural networks,” in Proceedings of Thirty Fourth Conference on Learning Theory, ser. Proceedings of Machine Learning Research, M. Belkin and S. Kpotufe, Eds., vol. 134. PMLR, 15–19 Aug 2021, pp. 804–820. [Online]. Available: https://proceedings.mlr.press/v134/bubeck21a.html
- S. Bubeck and M. Sellke, “A universal law of robustness via isoperimetry,” Advances in Neural Information Processing Systems, vol. 34, pp. 28 811–28 822, 2021.
- J. Puigcerver, R. Jenatton, C. Riquelme, P. Awasthi, and S. Bhojanapalli, “On the adversarial robustness of mixture of experts,” Advances in Neural Information Processing Systems, vol. 35, pp. 9660–9671, 2022.
- A. H. Ribeiro and T. B. Schön, “Overparameterized linear regression under adversarial attacks,” IEEE Transactions on Signal Processing, vol. 71, pp. 601–614, 2023.
- S. Jetley, N. Lord, and P. Torr, “With friends like these, who needs adversaries?” Advances in neural information processing systems, vol. 31, 2018.
- A. Ilyas, S. Santurkar, D. Tsipras, L. Engstrom, B. Tran, and A. Madry, “Adversarial examples are not bugs, they are features,” Advances in neural information processing systems, vol. 32, 2019.
- D. Tsipras, S. Santurkar, L. Engstrom, A. Turner, and A. Madry, “Robustness may be at odds with accuracy,” in International Conference on Learning Representations, no. 2019, 2019.
- M. Demircigil, J. Heusel, M. Löwe, S. Upgang, and F. Vermet, “On a model of associative memory with huge storage capacity,” Journal of Statistical Physics, vol. 168, pp. 288–299, 2017.
- C. Lucibello and M. Mézard, “The exponential capacity of dense associative memories,” arXiv preprint arXiv:2304.14964, 2023.
- T. Hou, K. M. Wong, and H. Huang, “Minimal model of permutation symmetry in unsupervised learning,” Journal of Physics A: Mathematical and Theoretical, vol. 52, no. 41, p. 414001, 2019.
- N. E. Boukacem, A. Leary, R. Thériault, F. Gottlieb, M. Mani, and P. François, “A waddington landscape for prototype learning in generalized hopfield networks,” arXiv preprint arXiv:2312.03012, 2023.