Inverse Approximation Theory for Nonlinear Recurrent Neural Networks
Abstract: We prove an inverse approximation theorem for the approximation of nonlinear sequence-to-sequence relationships using recurrent neural networks (RNNs). This is a so-called Bernstein-type result in approximation theory, which deduces properties of a target function under the assumption that it can be effectively approximated by a hypothesis space. In particular, we show that nonlinear sequence relationships that can be stably approximated by nonlinear RNNs must have an exponential decaying memory structure - a notion that can be made precise. This extends the previously identified curse of memory in linear RNNs into the general nonlinear setting, and quantifies the essential limitations of the RNN architecture for learning sequential relationships with long-term memory. Based on the analysis, we propose a principled reparameterization method to overcome the limitations. Our theoretical results are confirmed by numerical experiments. The code has been released in https://github.com/radarFudan/Curse-of-memory
- Learning representations by back-propagating errors. Nature, 323(6088):533–536, October 1986. ISSN 1476-4687. doi: 10.1038/323533a0.
- Recurrent neural networks and robust time series prediction. IEEE transactions on neural networks, 5(2):240–254, 1994.
- Generating Text with Recurrent Neural Networks. In International Conference on Machine Learning, pages 1017–1024, January 2011.
- Towards End-To-End Speech Recognition with Recurrent Neural Networks. In Proceedings of the 31st International Conference on Machine Learning, pages 1764–1772. PMLR, June 2014.
- Document Modeling with Gated Recurrent Neural Network for Sentiment Classification. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1422–1432, Lisbon, Portugal, September 2015. Association for Computational Linguistics. doi: 10.18653/v1/D15-1167.
- Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2):157–166, March 1994. ISSN 1941-0093. doi: 10.1109/72.279181.
- N. I. Achieser. Theory of Approximation. Courier Corporation, June 2013. ISBN 978-0-486-15313-1.
- On the Curse of Memory in Recurrent Neural Networks: Approximation and Optimization Analysis. In International Conference on Learning Representations, October 2020.
- Approximation and Optimization Theory for Linear Continuous-Time Recurrent Neural Networks. Journal of Machine Learning Research, 23(42):1–85, 2022. ISSN 1533-7928.
- Serge Bernstein. Sur la meilleure approximation de |x| par des polynomes de degrés donnés. Acta Mathematica, 37(1):1–57, December 1914. ISSN 1871-2509. doi: 10.1007/BF02401828.
- G. Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, 2(4):303–314, December 1989. ISSN 1435-568X. doi: 10.1007/BF02551274.
- Learning long-term dependencies in irregularly-sampled time series. arXiv preprint arXiv:2006.04418, 2020.
- Resurrecting Recurrent Neural Networks for Long Sequences, March 2023.
- Eduardo D. Sontag. A learning result for continuous-time recurrent neural networks. Systems & Control Letters, 34(3):151–158, June 1998. ISSN 01676911. doi: 10.1016/S0167-6911(98)00006-1.
- Learning Recurrent Neural Net Models of Nonlinear Systems. In Proceedings of the 3rd Conference on Learning for Dynamics and Control, pages 425–435. PMLR, May 2021.
- Reservoir Computing Universality With Stochastic Inputs. IEEE Transactions on Neural Networks and Learning Systems, 31(1):100–112, January 2020. ISSN 2162-2388. doi: 10.1109/TNNLS.2019.2899649.
- Fading memory echo state networks are universal. Neural Networks, 138:10–13, 2021.
- Neural network approximation. Acta Numerica, 30:327–444, 2021.
- Ronald A DeVore. Nonlinear approximation. Acta numerica, 7:51–150, 1998. ISSN 1474-0508.
- Impossibility of Fast Stable Approximation of Analytic Functions from Equispaced Samples. SIAM Review, 53(2):308–318, January 2011. ISSN 0036-1445. doi: 10.1137/090774707.
- Universal Simulation of Stable Dynamical Systems by Recurrent Neural Nets. In Proceedings of the 2nd Conference on Learning for Dynamics and Control, pages 384–392. PMLR, July 2020.
- HiPPO: Recurrent Memory with Optimal Polynomial Projections. In Advances in Neural Information Processing Systems, volume 33, pages 1474–1487. Curran Associates, Inc., 2020.
- Efficiently Modeling Long Sequences with Structured State Spaces. In International Conference on Learning Representations, October 2021.
- Simplified State Space Layers for Sequence Modeling. In International Conference on Learning Representations, February 2023.
- State-space models with layer-wise nonlinearity are universal approximators with exponential decaying memory. arXiv preprint arXiv:2309.13414, 2023.
- Analytical Foundations of Volterra Series. IMA Journal of Mathematical Control and Information, 1(3):243–282, January 1984. ISSN 0265-0754. doi: 10.1093/imamci/1.3.243.
- Hassan K. Khalil. Nonlinear systems third edition (2002), 2002.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.