Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Closed-form Solution for Weight Optimization in Fully-connected Feed-forward Neural Networks (2401.06699v2)

Published 12 Jan 2024 in cs.LG and cs.AI

Abstract: This work addresses weight optimization problem for fully-connected feed-forward neural networks. Unlike existing approaches that are based on back-propagation (BP) and chain rule gradient-based optimization (which implies iterative execution, potentially burdensome and time-consuming in some cases), the proposed approach offers the solution for weight optimization in closed-form by means of least squares (LS) methodology. In the case where the input-to-output mapping is injective, the new approach optimizes the weights in a back-propagating fashion in a single iteration by jointly optimizing a set of weights in each layer for each neuron. In the case where the input-to-output mapping is not injective (e.g., in classification problems), the proposed solution is easily adapted to obtain its final solution in a few iterations. An important advantage over the existing solutions is that these computations (for all neurons in a layer) are independent from each other; thus, they can be carried out in parallel to optimize all weights in a given layer simultaneously. Furthermore, its running time is deterministic in the sense that one can obtain the exact number of computations necessary to optimize the weights in all network layers (per iteration, in the case of non-injective mapping). Our simulation and empirical results show that the proposed scheme, BPLS, works well and is competitive with existing ones in terms of accuracy, but significantly surpasses them in terms of running time. To summarize, the new method is straightforward to implement, is competitive and computationally more efficient than the existing ones, and is well-tailored for parallel implementation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (11)
  1. Mohamed A. Abdou, “Literature Review: Efficient Deep Neural Networks Techniques for Medical Image Analysis” Neural Computing and Applications, vol. 34, pp. 5791–5812, February 2022.
  2. Y. E. Nesterov, “A Method of Solving a Convex Programming Problem with Convergence Rate 𝒪⁢(1k2)𝒪1superscript𝑘2\mathcal{O}\left(\frac{1}{k^{2}}\right)caligraphic_O ( divide start_ARG 1 end_ARG start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ),” Doklady Akademii Nauk SSSR, vol. 269, no. 3, pp. 543–547, July 1982.
  3. N. Qian, “On the Momentum Term in Gradient Descent Learning Algorithms,” Neural networks, vol. 12, no. 1, pp. 145–151, January 1999.
  4. A. Graves, “Generating Sequences with Recurrent Neural Networks,” arXiv preprint arXiv:1308.0850, 2013.
  5. J. Duchi, E. Hazan, and Y. Singer, “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization,” The Journal of Machine Learning Research, vol. 12, pp. 2121–2159, July, 2011.
  6. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based Learning Applied to Document Recognition,” Proceedings of the IEEE, vol. 86, no. 11 pp. 2278–2324, November 1998.
  7. G. Hinton, L. Deng, D. Yu, et. al. “Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82–97, November 2012.
  8. A. Krizhevsky, I. Sutskever, G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” Communications of the ACM, vol. 60, no. 6 pp. 84–90, May 2017.
  9. I. Sutskever, J. Martens, G. Dahl, and G. Hinton, “On the Importance of Initialization and Momentum in Deep Learning,” The 30th International Conference on Machine Learning, Atlanta, GA, USA, pp. 1139–1147, June 16–21, 2013.
  10. D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” 3rd International Conference for Learning Representations, San Diego, CA, USA, pp. 1–15, May 7–9, 2015.
  11. Y. LeCun, C. Cortes, and C. J. C. Burges, “The MNIST Database of Handwritten Digits,”, 1999. Retrieved from http://yann.lecun.com/exdb/mnist/

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets