Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
127 tokens/sec
GPT-4o
11 tokens/sec
Gemini 2.5 Pro Pro
53 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
4 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

Regularization, early-stopping and dreaming: a Hopfield-like setup to address generalization and overfitting (2308.01421v2)

Published 1 Aug 2023 in cs.LG and cond-mat.dis-nn

Abstract: In this work we approach attractor neural networks from a machine learning perspective: we look for optimal network parameters by applying a gradient descent over a regularized loss function. Within this framework, the optimal neuron-interaction matrices turn out to be a class of matrices which correspond to Hebbian kernels revised by a reiterated unlearning protocol. Remarkably, the extent of such unlearning is proved to be related to the regularization hyperparameter of the loss function and to the training time. Thus, we can design strategies to avoid overfitting that are formulated in terms of regularization and early-stopping tuning. The generalization capabilities of these attractor networks are also investigated: analytical results are obtained for random synthetic datasets, next, the emerging picture is corroborated by numerical experiments that highlight the existence of several regimes (i.e., overfitting, failure and success) as the dataset parameters are varied.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (74)
  1. S.-I. Amari. Learning patterns and pattern sequences by self-organizing nets of threshold elements. IEEE Transactions., 21:1197–1206, 1972.
  2. W. A. Little. The existence of persistent states in the brain. Mathematical Biosciences, 19(1-2):101–120, 1974.
  3. J. J. Hopfield. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences of the United States of America, 79:2554–2558, 1982.
  4. Information storage in neural networks with low levels of activity. Physical Review A, 35(5):2293, 1987.
  5. Storing, learning and retrieving biased patterns. Applied Mathematics and Computation, 415:126716, 2022.
  6. Multitasking associative networks. Physical Review Letters, 109:268101, 2012.
  7. L. Cugliandolo. Correlated attractors from uncorrelated stimuli. Neural Computation, 6:220, 1993.
  8. Parallel retrieval of correlated patterns: From hopfield networks to boltzmann machines. Neural Networks, 38:52–63, 2013.
  9. The relativistic hopfield model with correlated patterns. Journal of Mathematical Physics, 61(123301), 2020.
  10. Neural networks with a redundant representation: Detecting the undetectable. Physical Review Letters, 124:028301, 2020.
  11. E. Agliari and G. De Marzo. Tolerance versus synaptic noise in dense associative memories. European Physical Journal Plus, 135:883, 2020.
  12. An inference problem in a mismatched setting: a spin-glass model with mattis interaction. SciPost Phys., 12:125, 2022.
  13. B. Wemmenhove and A.C.C. Coolen. Finite connectivity attractor neural networks. Journal of Physics A, 36(9617), 2003.
  14. Immune networks: multitasking capabilities near saturation. Journal of Physics A, 46:415003, 2013.
  15. Non-convex multi-species Hopfield models. Journal of Statistical Physics, 172:1247–1269, 2018.
  16. P. Baldi and S. S. Venkatesh. Number of stable points for spin-glasses and neural networks of higher orders. Physical Review Letters, 58, 1987.
  17. D. Krotov and J.J. Hopfield. Dense associative memory for pattern recognition. Advances in Neural Information Processing Systems, pages 1180–1188, 2016.
  18. Generalized Guerra’s interpolation schemes for dense associative neural networks. Neural Networks, 128:254–267, 2020.
  19. D.O. Hebb. The Organization of Behavior: A Neuropsychological Theory. New York, NY: John Wiley & Sons, 1949.
  20. D.J. Amit. Modeling brain function: The world of attractor neural networks. Cambridge university press, 1989.
  21. Unlearning has a stabilizing effect in collective memories. Nature Letters, 304:280158, 1983.
  22. I. Kanter and H. Sompolinsky. Associative recall of memory without errors. Physical Review A, 35(1):380, 1987.
  23. Perceptron beyond the limit of capacity. Journal of Physics France, 50:121–134, 1989.
  24. Prosopagnosia in high capacity neural networks storing uncorrelated classes. Journal of Physics France, 51:387–408, 1990.
  25. Statistical mechanics of Hopfield-like neural networks with modified interactions. Journal of Physics A, 24:2419, 1991.
  26. Convergent unlearning algorithm for the Hopfield neural network. IEE Computation Society Press, 2(95):30, 1995.
  27. Dreaming neural networks: Forgetting spurious memories and reinforcing pure ones. Neural Networks, 112:24–40, 2019.
  28. E. Marinari. Forgetting memories and their attractiveness. Neural Computation, 31(3):503–516, 2019.
  29. Supervised perceptron learning vs unsupervised Hebbian unlearning: Approaching optimal memory retrieval in Hopfield-like networks. Journal of Chemical Physics, 156:104107, 2022.
  30. Daydreaming hopfield networks and their surprising effectiveness on correlated data. In Associative Memory & Hopfield Networks in 2023, 2023.
  31. J.F. Fontanari. Generalization in a Hopfield network. Journal of Physics France, 51:2421–2430, 1990.
  32. The emergence of a concept in shallow neural networks. Neural Networks, 148:232–253, 2022.
  33. Supervised Hebbian learning. Europhysics Letters - Perspective, 141:11001, 2023.
  34. Storage and learning phase transitions in the random-features hopfield model. Physical Review Letters, 131(25):257301, 2023.
  35. On the equivalence of Hopfield networks and Boltzmann Machines. Neural Networks, 34:1–9, 2012.
  36. High-dimensional inference with the generalized Hopfield model: principal component analysis and corrections. Physical Review E, 83:051123, 2011.
  37. M. Mézard. Mean-field message-passing equations in the Hopfield model and its generalizations. Physical Review E, 95, 2017.
  38. On the effective initialisation for restricted Boltzmann machines via duality with Hopfield model. Neural Networks, 143:314–326, 2021.
  39. Outperforming RBM feature-extraction capabilities by “dreaming” mechanism. IEEE Transactions on Neural Networks and Learning Systems, pages 1–10, 6 2022.
  40. E. Agliari and C. Marullo. Boltzmann machines as generalized hopfield networks: a review on recent results and outlooks. Entropy, 23(1):34, 2021.
  41. Fundamental limits of overparametrized shallow neural networks for supervised learning. arXiv:2307.05635, 2023.
  42. Bayes-optimal limits in structured PCA, and how to reach them. arXiv:2210.01237, 2022.
  43. A replica approach to RBM in generative mode. in preparation, 2023.
  44. Weighted random k satisfiability for k= 1, 2 (r2sat) in discrete hopfield neural network. Applied Soft Computing, 126:109312, 2022.
  45. Unlearning regularization for Boltzmann Machines. https://arxiv.org/pdf/2311.09418.pdf, 16:1065–1095, 2023.
  46. A modified reverse-based analysis logic mining model with weighted random 2 satisfiability logic in discrete hopfield neural network and multi-objective training of modified niched genetic algorithm. Expert Systems with Applications, 240:122307, 2024.
  47. Dreaming neural networks: rigorous results. Journal of Statistical Mechanics, page 083503, 2019.
  48. Theory of neural information processing systems. Oxford University Press, 2005.
  49. E. Gardner. The space of interactions in neural network models. Journal of Physics A, 21(1):257, 1988.
  50. E. Gardner and B. Derrida. Three unfinished works on the optimal storage capacity of networks. Journal of Physics A, 22(12):1983, 1989.
  51. Information storage and retrieval in spin-glass like neural networks. Journal of Physics Letters, 46:359–365, 1985.
  52. A.E. Hoerl. Application of ridge analysis to regression problems. Chemical Engineering Progress, 58:54–59, 1962.
  53. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1):55–67, 1970.
  54. Solutions of Ill-posed Problems. Halsted Press book. Winston, 1977.
  55. V.N. Vapnik. Statistical Learning Theory. A Wiley-Interscience publication. Wiley, 1998.
  56. B. Schölkopf and A. J. Smola. Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press, 2002.
  57. V. Vovk. Kernel ridge regression. In Empirical Inference: Festschrift in Honor of Vladimir N. Vapnik, pages 105–116. Springer, 2013.
  58. Efficient hyperparameter tuning for large scale kernel ridge regression. In International Conference on Artificial Intelligence and Statistics, pages 6554–6572. PMLR, 2022.
  59. Learning the optimal tikhonov regularizer for inverse problems. Advances in Neural Information Processing Systems, 34:25205–25216, 2021.
  60. D. Wu and J. Xu. On the optimal weighted l2subscript𝑙2l_{2}italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT regularization in overparameterized linear regression. Advances in Neural Information Processing Systems, 33:10112–10123, 2020.
  61. Surprises in high-dimensional ridgeless least squares interpolation. Annals of statistics, 50(2):949, 2022.
  62. Benign overfitting in linear regression. Proceedings of the National Academy of Sciences, 117(48):30063–30070, 2020.
  63. F. Crick and G. Mitchison. The function of dream sleep. Nature, 304(5922):111–114, 1983.
  64. Hebbian dreaming for small datasets. Neural Networks, page 106174, 2024.
  65. T. O. Kohonen. Self-organization and associative memory. Berlin: Springer, 1984.
  66. L. Deng. The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE signal processing magazine, 29(6):141–142, 2012.
  67. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
  68. N. Otsu. A threshold selection method from gray-level histograms. IEEE transactions on systems, man, and cybernetics, 9(1):62–66, 1979.
  69. T.O. Kohonen and M. Ruohonen. Representation of Associated Data by Matrix Operators. IEEE Transactions on Computers, C-22(7):701–702, 1973.
  70. S. Bös. Statistical mechanics approach to early stopping and weight decay. Physical Review E, 58(1):833, 1998.
  71. G.A. Christos. Investigation of the crick-mitchison reverse-learning dream sleep hypothesis in a dynamical setting. Neural Networks, 9(3):427–434, 1996.
  72. Disjoint set union for trees. In 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), pages 1–6. IEEE, 2021.
  73. Unveiling the structure of wide flat minima in neural networks. Physical Review Letters, 127:278301, 2021.
  74. Shaping the learning landscape in neural networks around wide flat minima. Proceedings of the National Academy of Sciences, 117(1):161–170, 2020.
Citations (4)

Summary

We haven't generated a summary for this paper yet.