Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning Discretized Neural Networks under Ricci Flow (2302.03390v5)

Published 7 Feb 2023 in cs.LG, cs.IT, cs.NE, and math.IT

Abstract: In this paper, we study Discretized Neural Networks (DNNs) composed of low-precision weights and activations, which suffer from either infinite or zero gradients due to the non-differentiable discrete function during training. Most training-based DNNs in such scenarios employ the standard Straight-Through Estimator (STE) to approximate the gradient w.r.t. discrete values. However, the use of STE introduces the problem of gradient mismatch, arising from perturbations in the approximated gradient. To address this problem, this paper reveals that this mismatch can be interpreted as a metric perturbation in a Riemannian manifold, viewed through the lens of duality theory. Building on information geometry, we construct the Linearly Nearly Euclidean (LNE) manifold for DNNs, providing a background for addressing perturbations. By introducing a partial differential equation on metrics, i.e., the Ricci flow, we establish the dynamical stability and convergence of the LNE metric with the $L2$-norm perturbation. In contrast to previous perturbation theories with convergence rates in fractional powers, the metric perturbation under the Ricci flow exhibits exponential decay in the LNE manifold. Experimental results across various datasets demonstrate that our method achieves superior and more stable performance for DNNs compared to other representative training-based methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (68)
  1. Mirror descent view for neural network quantization. In International Conference on Artificial Intelligence and Statistics, pages 2809–2817. PMLR, 2021.
  2. S.-I. Amari. Natural gradient works efficiently in learning. Neural computation, 10(2):251–276, 1998.
  3. S.-i. Amari. Information geometry and its applications, volume 194. Springer, 2016.
  4. S.-i. Amari and H. Nagaoka. Methods of information geometry, volume 191 of translations of mathematical monographs, s. kobayashi and m. takesaki, editors. American Mathematical Society, Providence, RI, USA, pages 2–19, 2000.
  5. A. Appleton. Scalar curvature rigidity and ricci deturck flow on perturbations of euclidean space. Calculus of Variations and Partial Differential Equations, 57(5):1–23, 2018.
  6. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
  7. Proxquant: Quantized neural networks via proximal operators. arXiv preprint arXiv:1810.00861, 2018.
  8. R. H. Bamler. Stability of hyperbolic manifolds with cusps under ricci flow. arXiv preprint arXiv:1004.2058, 2010.
  9. R. H. Bamler. Stability of Einstein metrics of negative curvature. Princeton University, 2011.
  10. M. Basseville. Divergence measures for statistical data processing—an annotated bibliography. Signal Processing, 93(4):621–633, 2013.
  11. A. Beck and M. Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31(3):167–175, 2003.
  12. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013.
  13. A. L. Besse. Einstein manifolds. Springer Science & Business Media, 2007.
  14. L. M. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR computational mathematics and mathematical physics, 7(3):200–217, 1967.
  15. S. Bubeck et al. Convex optimization: Algorithms and complexity. Foundations and Trends® in Machine Learning, 8(3-4):231–357, 2015.
  16. Deep learning with low precision by half-wave gaussian quantization. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5918–5926, 2017.
  17. A learning framework for n-bit quantized neural networks toward fpgas. IEEE Transactions on Neural Networks and Learning Systems, pages 1–15, 2020. doi: 10.1109/TNNLS.2020.2980041.
  18. Metaquant: Learning to quantize by learning to penetrate non-differentiable quantization. In Advances in Neural Information Processing Systems, volume 32, pages 3916–3926. Curran Associates, Inc., 2019.
  19. Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830, 2016.
  20. G. Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems, 2(4):303–314, 1989.
  21. A. Deruelle and K. Kröncke. Stability of ale ricci-flat manifolds under ricci flow. The Journal of Geometric Analysis, 31(3):2829–2870, 2021.
  22. D. M. DeTurck. Deforming metrics in the direction of their ricci tensors. Journal of Differential Geometry, 18(1):157–162, 1983.
  23. B. Devyver. A gaussian estimate for the heat kernel on differential forms and application to the riesz transform. Mathematische Annalen, 358(1):25–68, 2014.
  24. T. Dozat. Incorporating nesterov momentum into adam. 2016.
  25. Deepshift: Towards multiplication-less neural networks. arXiv preprint arXiv:1905.13298, 2019.
  26. Stability of the ricci flow at ricci-flat metrics. Communications in Analysis and Geometry, 10(4):741–777, 2002.
  27. Ristretto: A framework for empirical study of resource-efficient inference in convolutional neural networks. IEEE transactions on neural networks and learning systems, 29(11):5784–5789, 2018.
  28. R. S. Hamilton et al. Three-manifolds with positive ricci curvature. J. Differential geom, 17(2):255–306, 1982.
  29. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  30. S. Helgason. Differential geometry and symmetric spaces, volume 341. American Mathematical Soc., 2001.
  31. G. Hinton. Neural networks for machine learning. coursera,[video lectures], 2012.
  32. K. Hornik. Approximation capabilities of multilayer feedforward networks. Neural networks, 4(2):251–257, 1991.
  33. Loss-aware binarization of deep networks. arXiv preprint arXiv:1611.01600, 2016.
  34. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
  35. S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
  36. Neural network approximations for calabi-yau metrics. arXiv preprint arXiv:2012.15821, 2020.
  37. T. Kato. Perturbation theory for linear operators, volume 132. Springer Science & Business Media, 2013.
  38. P. Kaul and B. Lall. Riemannian curvature of deep neural networks. IEEE transactions on neural networks and learning systems, 31(4):1410–1416, 2019.
  39. H. Koch and T. Lamm. Geometric flows with rough initial data. Asian Journal of Mathematics, 16(2):209–235, 2012.
  40. N. Koiso. Einstein metrics and complex structures. Inventiones mathematicae, 73(1):71–106, 1983.
  41. Learning multiple layers of features from tiny images. 2009.
  42. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
  43. Linear and quasi-linear equations of parabolic type, volume 23. American Mathematical Soc., 1988.
  44. Extremely low bit neural network: Squeeze the last bit out with admm. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
  45. Ternary weight networks. arXiv preprint arXiv:1605.04711, 2016.
  46. Bi-real net: Enhancing the performance of 1-bit cnns with improved representational capability and advanced training algorithm. In Proceedings of the European conference on computer vision (ECCV), pages 722–737, 2018.
  47. Relaxed quantization for discretized neural networks. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=HkxjYoCqKX.
  48. J. Martens and R. Grosse. Optimizing neural networks with kronecker-factored approximate curvature. In International conference on machine learning, pages 2408–2417, 2015.
  49. V. Minerbe. Weighted sobolev inequalities and ricci flat manifolds. Geometric and Functional Analysis, 18(5):1696–1749, 2009.
  50. A. Nemirovsky and D. Yudin. Informational complexity and efficient methods for solution of convex extremal problems. Ékonomika i Mathematicheskie Metody, 12, 1983.
  51. T. Pacini. Desingularizing isolated conical singularities: uniform estimates via weighted sobolev spaces. arXiv preprint arXiv:1005.3511, 2010.
  52. Pytorch: An imperative style, high-performance deep learning library. arXiv preprint arXiv:1912.01703, 2019.
  53. Forward and backward information retention for accurate binary neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2250–2259, 2020.
  54. G. Raskutti and S. Mukherjee. The information geometry of mirror descent. IEEE Transactions on Information Theory, 61(3):1451–1457, 2015.
  55. Xnor-net: Imagenet classification using binary convolutional neural networks. In European conference on computer vision, pages 525–542. Springer, 2016.
  56. Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3):211–252, 2015.
  57. Long short-term memory recurrent neural network architectures for large scale acoustic modeling. 2014.
  58. Stability of euclidean space under ricci flow. arXiv preprint arXiv:0706.0421, 2007.
  59. N. Sesum. Linear and dynamical stability of ricci-flat metrics. Duke Mathematical Journal, 133(1):1–26, 2006.
  60. N. Sheridan and H. Rubinstein. Hamilton’s ricci flow. Honour thesis, 2006.
  61. C. Shorten and T. M. Khoshgoftaar. A survey on image data augmentation for deep learning. Journal of big data, 6(1):1–48, 2019.
  62. R. M. Wald. General relativity. University of Chicago press, 2010.
  63. Quantization networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  64. Incremental network quantization: Towards lossless cnns with low-precision weights. arXiv preprint arXiv:1702.03044, 2017.
  65. Explicit loss-error-aware quantization for low-bit deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 9426–9435, 2018.
  66. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160, 2016.
  67. Trained ternary quantization. arXiv preprint arXiv:1612.01064, 2016.
  68. Towards unified int8 training for convolutional neural network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1969–1979, 2020.
Citations (2)

Summary

We haven't generated a summary for this paper yet.