Papers
Topics
Authors
Recent
Search
2000 character limit reached

Dynamic Universal Approximation Theory: Foundations for Parallelism in Neural Networks

Published 31 Jul 2024 in cs.LG and cs.AI | (2407.21670v5)

Abstract: Neural networks are increasingly evolving towards training large models with big data, a method that has demonstrated superior performance across many tasks. However, this approach introduces an urgent problem: current deep learning models are predominantly serial, meaning that as the number of network layers increases, so do the training and inference times. This is unacceptable if deep learning is to continue advancing. Therefore, this paper proposes a deep learning parallelization strategy based on the Universal Approximation Theorem (UAT). From this foundation, we designed a parallel network called Para-Former to test our theory. Unlike traditional serial models, the inference time of Para-Former does not increase with the number of layers, significantly accelerating the inference speed of multi-layer networks. Experimental results validate the effectiveness of this network.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)
  1. Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis. arXiv:1802.09941.
  2. An Analysis of Single-Layer Networks in Unsupervised Feature Learning. In International Conference on Artificial Intelligence and Statistics.
  3. Cybenko, G. 2007. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, and Systems, 303–314.
  4. Sparse GPU Kernels for Deep Learning. SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, 1–14.
  5. Generative adversarial nets. Advances in neural information processing systems, 27.
  6. Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–778.
  7. Multilayer feedforward networks are universal approximators. Neural Networks, 2: 359–366.
  8. Krizhevsky, A. 2009. Learning Multiple Layers of Features from Tiny Images.
  9. Sora: A review on background, technology, limitations, and opportunities of large vision models. arXiv preprint arXiv:2402.17177.
  10. LLM-Pruner: On the Structural Pruning of Large Language Models. ArXiv, abs/2305.11627.
  11. Automated Flower Classification over a Large Number of Classes. 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, 722–729.
  12. Cats and dogs. 2012 IEEE Conference on Computer Vision and Pattern Recognition, 3498–3505.
  13. U-Net: Convolutional Networks for Biomedical Image Segmentation. ArXiv, abs/1505.04597.
  14. A Simple and Effective Pruning Approach for Large Language Models. ArXiv, abs/2306.11695.
  15. Attention is All you Need. In Neural Information Processing Systems.
  16. Universal Approximation Theory: The basic theory for deep learning-based computer vision models. arXiv:2407.17480.
  17. Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning. ArXiv, abs/2310.06694.
Citations (2)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 1 like about this paper.