The Copycat Perceptron: Smashing Barriers Through Collective Learning (2308.03743v3)
Abstract: We characterize the equilibrium properties of a model of $y$ coupled binary perceptrons in the teacher-student scenario, subject to a suitable cost function, with an explicit ferromagnetic coupling proportional to the Hamming distance between the students' weights. In contrast to recent works, we analyze a more general setting in which thermal noise is present that affects each student's generalization performance. In the nonzero temperature regime, we find that the coupling of replicas leads to a bend of the phase diagram towards smaller values of $\alpha$: This suggests that the free entropy landscape gets smoother around the solution with perfect generalization (i.e., the teacher) at a fixed fraction of examples, allowing standard thermal updating algorithms such as Simulated Annealing to easily reach the teacher solution and avoid getting trapped in metastable states as it happens in the unreplicated case, even in the computationally \textit{easy} regime of the inference phase diagram. These results provide additional analytic and numerical evidence for the recently conjectured Bayes-optimal property of Replicated Simulated Annealing (RSA) for a sufficient number of replicas. From a learning perspective, these results also suggest that multiple students working together (in this case reviewing the same data) are able to learn the same rule both significantly faster and with fewer examples, a property that could be exploited in the context of cooperative and federated learning.
- E. Gardner and B. Derrida, Journal of Physics A: Mathematical and General 21, 271 (1988).
- H. S. Seung, H. Sompolinsky, and N. Tishby, Phys. Rev. A 45, 6056 (1992).
- L. Zdeborová and F. Krzakala, Advances in Physics 65, 453 (2016).
- M. Mézard, G. Parisi, and R. Zecchina, Science 297, 812 (2002).
- L. Zdeborová and F. Krzakala, Phys. Rev. E 76, 031131 (2007).
- F. Krzakala and L. Zdeborová, Phys. Rev. Lett. 102, 238701 (2009).
- W. Krauth, M. Mézard, and J.-P. Nadal, Complex Systems 2, 387 (1988).
- R. Monasson and R. Zecchina, Phys. Rev. Lett. 75, 2432 (1995a).
- R. Monasson and R. Zecchina, Modern Physics Letters B 9, 1887 (1995b).
- R. Dietrich, M. Opper, and H. Sompolinsky, Phys. Rev. Lett. 82, 2975 (1999).
- E. Barkai, D. Hansel, and H. Sompolinsky, Phys. Rev. A 45, 4146 (1992).
- S. Franz and G. Parisi, Journal of Physics A: Mathematical and Theoretical 49, 145001 (2016).
- H. Huang, K. Y. M. Wong, and Y. Kabashima, Journal of Physics A: Mathematical and Theoretical 46, 375002 (2013), publisher: IOP Publishing.
- M. C. Angelini and F. Ricci-Tersenghi, Physical Review X 13, 021011 (2023).
- G. Györgyi, Phys. Rev. A 41, 7097 (1990).
- T. L. H. Watkin, A. Rau, and M. Biehl, Rev. Mod. Phys. 65, 499 (1993).
- H. Huang and Y. Kabashima, Phys. Rev. E 90, 052813 (2014).
- R. Monasson, Physical review letters 75, 2847 (1995).
- S. Franz and G. Parisi, Journal de Physique I 5, 1401 (1995).
- A. Engel and C. Van den Broeck, Statistical Mechanics of Learning (Cambridge University Press, Cambridge, 2001).
- H. Horner, Zeitschrift für Physik B Condensed Matter 87, 371 (1992a).
- H. Horner, Zeitschrift für Physik B Condensed Matter 86, 291 (1992b).
- F. Rosenblatt, Psychological Review 65, 386 (1958).
- M. Mezard, G. Parisi, and M. Virasoro, Spin Glass Theory and Beyond (WORLD SCIENTIFIC, 1986) https://www.worldscientific.com/doi/pdf/10.1142/0271 .
- ic.kampal.com.
- G. Catania, A. Decelle, and B. Seoane, In preparation.