Expressive Power of ReLU and Step Networks under Floating-Point Operations (2401.15121v2)
Abstract: The study of the expressive power of neural networks has investigated the fundamental limits of neural networks. Most existing results assume real-valued inputs and parameters as well as exact operations during the evaluation of neural networks. However, neural networks are typically executed on computers that can only represent a tiny subset of the reals and apply inexact operations, i.e., most existing results do not apply to neural networks used in practice. In this work, we analyze the expressive power of neural networks under a more realistic setup: when we use floating-point numbers and operations as in practice. Our first set of results assumes floating-point operations where the significand of a float is represented by finite bits but its exponent can take any integer value. Under this setup, we show that neural networks using a binary threshold unit or ReLU can memorize any finite input/output pairs and can approximate any continuous function within an arbitrary error. In particular, the number of parameters in our constructions for universal approximation and memorization coincides with that in classical results assuming exact mathematical operations. We also show similar results on memorization and universal approximation when floating-point operations use finite bits for both significand and exponent; these results are applicable to many popular floating-point formats such as those defined in the IEEE 754 standard (e.g., 32-bit single-precision format) and bfloat16.
- IEEE Standard for Floating-Point Arithmetic. Standard, IEEE Computer Society, 2019.
- Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467, 2016.
- Nearly-tight VC-dimension and pseudodimension bounds for piecewise linear neural networks. Journal of Machine Learning Research, 2019.
- E. B. Baum. On the capabilities of multilayer perceptrons. Journal of Complexity, 1988.
- S. Boldo. Stupid is as stupid does: Taking the square root of the square of a floating-point number. Electronic Notes in Theoretical Computer Science, 317:27--32, 2015.
- Floating-point arithmetic. Acta Numerica, 32:203--290, 2023.
- S. Boldo and G. Melquiond. Flocq: A unified library for proving floating-point algorithms in Coq. In IEEE Symposium on Computer Arithmetic (ARITH), pages 243--252, 2011.
- G. Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, and Systems (MCSS), 2(4):303--314, 1989.
- On the universal approximability and complexity bounds of quantized relu neural networks. In International Conference on Learning Representations (ICLR), 2019.
- Multilayer feedforward networks are universal approximators. Neural Networks, 2(5):359--366, 1989.
- Upper bounds on the number of hidden neurons in feedforward networks with arbitrary bounded nonlinear activation functions. IEEE Transactions on Neural Networks, 1998.
- C. Jeannerod. Exploiting structure in floating-point arithmetic. In International Conference on Mathematical Aspects of Computer and Information Sciences (MACIS), pages 25--34, 2015.
- Sharp error bounds for complex floating-point inversion. Numerical Algorithms, 73(3):735--760, 2016.
- C. Jeannerod and S. M. Rump. On relative errors of floating-point operations: Optimal bounds and applications. Mathematics of Computation, 87(310):803--819, 2018.
- The expressive power of neural networks: A view from the width. In Annual Conference on Neural Information Processing Systems (NeurIPS), 2017.
- Handbook of floating-point arithmetic. Springer, 2018.
- Provable memorization via deep neural networks using sub-linear parameters. In Conference on Learning Theory (COLT), 2021.
- Minimum width for universal approximation. In International Conference on Learning Representations (ICLR), 2021.
- A. Pinkus. Approximation theory of the mlp model in neural networks. Acta Numerica, 8:143 -- 195, 1999.
- On practical constraints of approximation using neural networks on current digital computers. In IEEE 18th International Conference on Intelligent Engineering Systems, 2014.
- P. H. Sterbenz. Floating-point computation. Prentice Hall, 1973.
- On the optimal memorization power of RELU neural networks. In Conference on Learning Theory (COLT), 2022.
- R. Vershynin. Memory capacity of neural networks with threshold and rectified linear unit activations. SIAM Journal on Mathematics of Data Science, 2020.
- J. Wray and G. G. Green. Neural networks, approximation theory, and finite precision computation. Neural networks, 1995.
- D. Yarotsky. Optimal approximation of continuous functions by very deep ReLU networks. In Conference on Learning Theory (COLT), 2018.
- Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity. In Annual Conference on Neural Information Processing Systems (NeurIPS), 2019.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.