Learning Transductions and Alignments with RNN Seq2seq Models (2303.06841v4)
Abstract: The paper studies the capabilities of Recurrent-Neural-Network sequence to sequence (RNN seq2seq) models in learning four transduction tasks: identity, reversal, total reduplication, and quadratic copying. These transductions are traditionally well studied under finite state transducers and attributed with increasing complexity. We find that RNN seq2seq models are only able to approximate a mapping that fits the training or in-distribution data, instead of learning the underlying functions. Although attention makes learning more efficient and robust, it does not overcome the out-of-distribution generalization limitation. We establish a novel complexity hierarchy for learning the four tasks for attention-less RNN seq2seq models, which may be understood in terms of the complexity hierarchy of formal languages, instead of string transductions. RNN variants also play a role in the results. In particular, we show that Simple RNN seq2seq models cannot count the input length.
- Subregular complexity and deep learning, 2017. URL https://arxiv.org/abs/1705.05940.
- Explaining black boxes on sequential data using weighted automata. In Olgierd Unold, Witold Dyrka, and Wojciech Wieczorek, editors, Proceedings of The 14th International Conference on Grammatical Inference 2018, volume 93 of Proceedings of Machine Learning Research, pages 81–103. PMLR, feb 2019. URL https://proceedings.mlr.press/v93/ayache19a.html.
- Neural machine translation by jointly learning to align and translate. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. URL http://arxiv.org/abs/1409.0473.
- Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2):157–166, 1994. 10.1109/72.279181.
- On the Ability and Limitations of Transformers to Recognize Formal Languages. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7096–7116, Online, November 2020. Association for Computational Linguistics. 10.18653/v1/2020.emnlp-main.576. URL https://aclanthology.org/2020.emnlp-main.576.
- String-to-String Interpretations With Polynomial-Size Output. In Christel Baier, Ioannis Chatzigiannakis, Paola Flocchini, and Stefano Leonardi, editors, 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019), volume 132 of Leibniz International Proceedings in Informatics (LIPIcs), pages 106:1–106:14, Dagstuhl, Germany, 2019. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik. ISBN 978-3-95977-109-2. 10.4230/LIPIcs.ICALP.2019.106. URL http://drops.dagstuhl.de/opus/volltexte/2019/10682.
- Massive exploration of neural machine translation architectures. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1442–1451, Copenhagen, Denmark, September 2017. Association for Computational Linguistics. 10.18653/v1/D17-1151. URL https://aclanthology.org/D17-1151.
- Towards non-saturating recurrent units for modelling long-term dependencies. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, AAAI’19/IAAI’19/EAAI’19. AAAI Press, 2019. ISBN 978-1-57735-809-1. 10.1609/aaai.v33i01.33013280. URL https://doi.org/10.1609/aaai.v33i01.33013280.
- Recurrent neural networks as weighted language recognizers. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 2261–2271, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. 10.18653/v1/N18-1205. URL https://aclanthology.org/N18-1205.
- Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1724–1734, Doha, Qatar, October 2014. Association for Computational Linguistics. 10.3115/v1/D14-1179. URL https://aclanthology.org/D14-1179.
- Noam Chomsky. Three models for the description of language. IRE Transactions on Information Theory, 2(3):113–124, 1956. 10.1109/TIT.1956.1056813.
- Empirical evaluation of gated recurrent neural networks on sequence modeling, 2014. URL https://arxiv.org/abs/1412.3555.
- Neural networks and the chomsky hierarchy, 2022. URL https://arxiv.org/abs/2207.02098.
- Computing and classifying reduplication with 2-way finite-state transducers. Journal of Language Modelling, 8(1):179–250, Sep. 2020. 10.15398/jlm.v8i1.245. URL https://jlm.ipipan.waw.pl/index.php/JLM/article/view/245.
- Towards a rigorous science of interpretable machine learning, 2017. URL https://arxiv.org/abs/1702.08608.
- Jeffrey L. Elman. Finding structure in time. Cognitive Science, 14(2):179–211, 1990. https://doi.org/10.1207/s15516709cog1402_1. URL https://onlinelibrary.wiley.com/doi/abs/10.1207/s15516709cog1402_1.
- Transducers, logic and algebra for functions of finite words. ACM SIGLOG News, 3(3):4–19, aug 2016. 10.1145/2984450.2984453. URL https://doi.org/10.1145/2984450.2984453.
- Convolutional sequence to sequence learning. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 1243–1252. PMLR, 06–11 Aug 2017. URL https://proceedings.mlr.press/v70/gehring17a.html.
- Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks. Neural Computation, 4(3):393–405, 05 1992. ISSN 0899-7667. 10.1162/neco.1992.4.3.393. URL https://doi.org/10.1162/neco.1992.4.3.393.
- Understanding the difficulty of training deep feedforward neural networks. In Yee Whye Teh and Mike Titterington, editors, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, volume 9 of Proceedings of Machine Learning Research, pages 249–256, Chia Laguna Resort, Sardinia, Italy, 13–15 May 2010. PMLR.
- Yoav Goldberg. A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research, 57:345–420, 2016. https://doi.org/10.1613/jair.4992. URL https://jair.org/index.php/jair/article/view/11030.
- Neural turing machines, 2014. URL https://arxiv.org/abs/1410.5401.
- Learning to transduce with unbounded memory. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015. URL https://proceedings.neurips.cc/paper/2015/file/b9d487a30398d42ecff55c228ed5652b-Paper.pdf.
- Long short-term memory. Neural Comput., 9(8):1735–1780, nov 1997. ISSN 0899-7667. 10.1162/neco.1997.9.8.1735. URL https://doi.org/10.1162/neco.1997.9.8.1735.
- Aravind K. Joshi. Tree adjoining grammars: How much context-sensitivity is required to provide reasonable structural descriptions?, page 206–250. Studies in Natural Language Processing. Cambridge University Press, 1985. 10.1017/CBO9780511597855.007.
- Adam: A method for stochastic optimization. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. URL http://arxiv.org/abs/1412.6980.
- Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1412–1421, Lisbon, Portugal, September 2015. Association for Computational Linguistics. 10.18653/v1/D15-1166. URL https://aclanthology.org/D15-1166.
- William Merrill. Sequential neural networks as automata. In Proceedings of the Workshop on Deep Learning and Formal Languages: Building Bridges, pages 1–13, Florence, August 2019a. Association for Computational Linguistics. 10.18653/v1/W19-3901. URL https://aclanthology.org/W19-3901.
- William Merrill. Sequential neural networks as automata, 2019b. URL https://arxiv.org/abs/1906.01615.
- Probing RNN encoder-decoder generalization of subregular functions using reduplication. In Proceedings of the Society for Computation in Linguistics 2020, pages 167–178, New York, New York, January 2020. Association for Computational Linguistics. URL https://aclanthology.org/2020.scil-1.22.
- On the difficulty of training recurrent neural networks. In Sanjoy Dasgupta and David McAllester, editors, Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pages 1310–1318, Atlanta, Georgia, USA, 17–19 Jun 2013. PMLR. URL https://proceedings.mlr.press/v28/pascanu13.html.
- Pytorch: An imperative style, high-performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf.
- Rational recurrences. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1203–1214, Brussels, Belgium, October-November 2018. Association for Computational Linguistics. 10.18653/v1/D18-1152. URL https://aclanthology.org/D18-1152.
- Regular and polyregular theories of reduplication. Glossa: a journal of general linguistics, 8(1), 2023. 10.16995/glossa.8885. URL https://doi.org/10.16995/glossa.8885.
- Hava T. Siegelmann. Recurrent neural networks and finite automata. Computational Intelligence, 12(4):567–574, 1996. https://doi.org/10.1111/j.1467-8640.1996.tb00277.x. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-8640.1996.tb00277.x.
- Sequence to sequence learning with neural networks. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc., 2014. URL https://proceedings.neurips.cc/paper/2014/file/a14ac55a4f27472c5d894ec1c3c743d2-Paper.pdf.
- Mlregtest: A benchmark for the machine learning of regular languages, 2023.
- Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
- Ingmar Visser. Hidden markov model interpretations of neural networks. Behavioral and Brain Sciences, 23:494 – 495, 08 2000. 10.1017/S0140525X00513351.
- On the practical computational power of finite precision RNNs for language recognition. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 740–745, Melbourne, Australia, July 2018. Association for Computational Linguistics. 10.18653/v1/P18-2117. URL https://aclanthology.org/P18-2117.
- A learning algorithm for continually running fully recurrent neural networks. Neural Computation, 1(2):270–280, 1989. 10.1162/neco.1989.1.2.270.
- Why gradient clipping accelerates training: A theoretical justification for adaptivity. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020. URL https://openreview.net/forum?id=BJgnXpVYwS.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.