Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Capacity of the treelike sign perceptrons neural networks with one hidden layer -- RDT based upper bounds (2312.08244v1)

Published 13 Dec 2023 in cond-mat.dis-nn, cs.IT, math-ph, math.IT, math.MP, math.PR, and stat.ML

Abstract: We study the capacity of \emph{sign} perceptrons neural networks (SPNN) and particularly focus on 1-hidden layer \emph{treelike committee machine} (TCM) architectures. Similarly to what happens in the case of a single perceptron neuron, it turns out that, in a statistical sense, the capacity of a corresponding multilayered network architecture consisting of multiple \emph{sign} perceptrons also undergoes the so-called phase transition (PT) phenomenon. This means: (i) for certain range of system parameters (size of data, number of neurons), the network can be properly trained to accurately memorize \emph{all} elements of the input dataset; and (ii) outside the region such a training does not exist. Clearly, determining the corresponding phase transition curve that separates these regions is an extraordinary task and among the most fundamental questions related to the performance of any network. Utilizing powerful mathematical engine called Random Duality Theory (RDT), we establish a generic framework for determining the upper bounds on the 1-hidden layer TCM SPNN capacity. Moreover, we do so for \emph{any} given (odd) number of neurons. We further show that the obtained results \emph{exactly} match the replica symmetry predictions of \cite{EKTVZ92,BHS92}, thereby proving that the statistical physics based results are not only nice estimates but also mathematically rigorous bounds as well. Moreover, for $d\leq 5$, we obtain the capacity values that improve on the best known rigorous ones of \cite{MitchDurb89}, thereby establishing a first, mathematically rigorous, progress in well over 30 years.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks. 2019. available online at http://arxiv.org/abs/1901.08584.
  2. Properties of the geometry of solutions and capacity of multilayer neural networks with rectified linear unit activations. Phys. Rev. Lett., 123:170602, October 2019.
  3. P. Baldi and S. Venkatesh. Number od stable points for spin-glasses and neural networks of higher orders. Phys. Rev. Letters, 58(9):913–916, Mar. 1987.
  4. Statistical mechanics of a multilayered neural network. Phys. Rev. Lett., 65(18):2312–2315, Oct 1990.
  5. Broken symmetries in multilayered perceptrons. Phys. Rev. A, 45(6):4146, March 1992.
  6. E. Barkai and I. Kanter. Storage capacity of a multilayer neural network with binary weights. Europhys. Lett., 14(2):107, 1991.
  7. E. B. Baum. On the capabilities of multilayer perceptrons. Journal of complexity, 4(3):193–215, 1988.
  8. S. H. Cameron. Tech-report 60-600. Proceedings of the bionics symposium, pages 197–212, 1960. Wright air development division, Dayton, Ohio.
  9. T. Cover. Geomretrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Transactions on Electronic Computers, (EC-14):326–334, 1965.
  10. D. Donoho and J. Tanner. Neighborliness of randomly-projected simplices in high dimensions. Proc. National Academy of Sciences, 102(27):9452–9457, 2005.
  11. D. Donoho and J. Tanner. Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing. Phylosophical transactions of the royal society A: mathematical, physical and engineering sciences, 367, November 2009.
  12. D. Donoho and J. Tanner. Counting the face of randomly projected hypercubes and orthants, with application. Discrete and Computational Geometry, 43:522–541, 2010.
  13. Gradient descent provably optimizes overparameterized neural networks. 2018. available online at http://arxiv.org/abs/1810.02054.
  14. Storage capacity and learning algorithms for two-layer neural networks. Phys. Rev. A, 45(10):7590, May 1992.
  15. R. M. Durbin G. J. Mitchison. Bounds on the learning capacity of some multi-layer networks. Biological Cybernetics, 60:345–365, 1989.
  16. E. Gardner. The space of interactions in neural networks models. J. Phys. A: Math. Gen., 21:257–270, 1988.
  17. Mildly overparametrized neural nets can memorize training data efficiently. 2019. available online at http://arxiv.org/abs/1909.11837.
  18. Y. Gordon. On Milman’s inequality and random subspaces which escape through a mesh in Rnsuperscript𝑅𝑛{R}^{n}italic_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Geometric Aspect of of functional analysis, Isr. Semin. 1986-87, Lect. Notes Math, 1317, 1988.
  19. M. Hardt and T. Ma. Identity matters in deep learning. 2016. available online at http://arxiv.org/abs/1611.04231.
  20. G. B. Huang. Learning capability and storage capacity of two-hidden-layer feedforward networks. IEEE Transactions on Neural Networks, 14(2):274–281, 2003.
  21. Z. Ji and M. Telgarsky. Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow relu networks. 2019. available online at http://arxiv.org/abs/1909.12292.
  22. R. D. Joseph. The number of orthants in n𝑛nitalic_n-space instersected by an s𝑠sitalic_s-dimensional subspace. Tech. memo 8, project PARA, 1960. Cornel aeronautical lab., Buffalo, N.Y.
  23. Y. Li and Y. Liang. Learning overparameterized neural networks via stochastic gradient descent on structured data. In Advances in Neural Information Processing Systems, pages 8157–8166, 2018.
  24. R. Monasson and R. Zecchina. Weight space structure and internal representations: A direct approach to learning and generalization in multilayer neural networks. Phys. Rev. Lett., 75:2432, September 1995.
  25. S. Oymak and M. Soltanolkotabi. Towards moderate overparameterization: global convergence guarantees for training shallow neural networks. 2019. available online at http://arxiv.org/abs/1902.04674.
  26. L. Schlafli. Gesammelte Mathematische AbhandLungen I. Basel, Switzerland: Verlag Birkhauser, 1950.
  27. Z. Song and X. Yang. Quadratic suffices for over-parametrization via matrix Chernoff bound. 2019. available online at http://arxiv.org/abs/1906.03593.
  28. M. Stojnic. Block-length dependent thresholds in block-sparse compressed sensing. available online at http://arxiv.org/abs/0907.3679.
  29. M. Stojnic. Various thresholds for ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-optimization in compressed sensing. available online at http://arxiv.org/abs/0907.3666.
  30. M. Stojnic. Block-length dependent thresholds for ℓ2/ℓ1subscriptℓ2subscriptℓ1\ell_{2}/\ell_{1}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-optimization in block-sparse compressed sensing. ICASSP, IEEE International Conference on Acoustics, Signal and Speech Processing, pages 3918–3921, 14-19 March 2010. Dallas, TX.
  31. M. Stojnic. ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT optimization and its various thresholds in compressed sensing. ICASSP, IEEE International Conference on Acoustics, Signal and Speech Processing, pages 3910–3913, 14-19 March 2010. Dallas, TX.
  32. M. Stojnic. Recovery thresholds for ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT optimization in binary compressed sensing. ISIT, IEEE International Symposium on Information Theory, pages 1593 – 1597, 13-18 June 2010. Austin, TX.
  33. M. Stojnic. Another look at the Gardner problem. 2013. available online at http://arxiv.org/abs/1306.3979.
  34. M. Stojnic. Lifting/lowering Hopfield models ground state energies. 2013. available online at http://arxiv.org/abs/1306.3975.
  35. M. Stojnic. Negative spherical perceptron. 2013. available online at http://arxiv.org/abs/1306.3980.
  36. M. Stojnic. Regularly random duality. 2013. available online at http://arxiv.org/abs/1303.7295.
  37. M. Stojnic. Spherical perceptron as a storage memory with limited errors. 2013. available online at http://arxiv.org/abs/1306.3809.
  38. M. Stojnic. Fully bilinear generic and lifted random processes comparisons. 2016. available online at http://arxiv.org/abs/1612.08516.
  39. M. Stojnic. Generic and lifted probabilistic comparisons – max replaces minmax. 2016. available online at http://arxiv.org/abs/1612.08506.
  40. M. Stojnic. Binary perceptrons capacity via fully lifted random duality theory. 2023. available online at arxiv.
  41. M. Stojnic. Fully lifted random duality theory. 2023. available online at arxiv.
  42. M. Stojnic. Studying Hopfield models via fully lifted random duality theory. 2023. available online at arxiv.
  43. R. Sun. Optimization for deep learning: theory and algorithms. 2019. available online at http://arxiv.org/abs/1912.08957.
  44. R Urbanczik. Storage capacity of the fully-connected committee machine. J. Phys. A: Math. Gen., 30, 1997.
  45. S. Venkatesh. Epsilon capacity of neural networks. Proc. Conf. on Neural Networks for Computing, Snowbird, UT, 1986.
  46. R. Vershynin. Memory capacity of neural networks with threshold and ReLU activations. 2019. available online at http://arxiv.org/abs/2001.06938.
  47. J. G. Wendel. A problem in geometric probability. Mathematica Scandinavica, 1:109–111, 1962.
  48. R. O. Winder. Single stage threshold logic. Switching circuit theory and logical design, pages 321–332, Sep. 1961. AIEE Special publications S-134.
  49. R. O. Winder. Threshold logic. Ph. D. dissertation, Princetoin University, 1962.
  50. Y. Xiong and J. H. Oh C. Kwon. The storage capacity of a fully-connected committee machine. NIPS, 1997.
  51. M. Yamasaki. The lower bound of the capacity for a neural network with multiple hidden layers. In International Conference on Artificial Neural Networks, pages 546–549, 1993.
  52. Small relu networks are powerful memorizers: a tight analysis of memorization capacity. In Advances in Neural Information Processing Systems, pages 15532–15543, 2019.
  53. J. A. Zavatone-Veth and C. Pehlevan. Activation function dependence of the storage capacity of treelike neural networks. Phys. Rev. E, 103:L020301, February 2021.
  54. Understanding deep learning requires rethinking generalization. ICLR, 2017.
  55. Stochastic gradient descent optimizes overparameterized deep relu networks. 2018. available online at http://arxiv.org/abs/1811.08888.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com