Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
9 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Advancing Regular Language Reasoning in Linear Recurrent Neural Networks (2309.07412v2)

Published 14 Sep 2023 in cs.CL and cs.LG

Abstract: In recent studies, linear recurrent neural networks (LRNNs) have achieved Transformer-level performance in natural language and long-range modeling, while offering rapid parallel training and constant inference cost. With the resurgence of interest in LRNNs, we study whether they can learn the hidden rules in training sequences, such as the grammatical structures of regular language. We theoretically analyze some existing LRNNs and discover their limitations in modeling regular language. Motivated by this analysis, we propose a new LRNN equipped with a block-diagonal and input-dependent transition matrix. Experiments suggest that the proposed model is the only LRNN capable of performing length extrapolation on regular language tasks such as Sum, Even Pair, and Modular Arithmetic. The code is released at \url{https://github.com/tinghanf/RegluarLRNN}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1724–1734, Doha, Qatar. Association for Computational Linguistics.
  2. Noam Chomsky. 1956. Three models for the description of language. IRE Transactions on information theory, 2(3):113–124.
  3. Neural networks and the chomsky hierarchy. In The Eleventh International Conference on Learning Representations.
  4. Jeffrey L Elman. 1990. Finding structure in time. Cognitive science, 14(2):179–211.
  5. Hungry hungry hippos: Towards language modeling with state space models. In The Eleventh International Conference on Learning Representations.
  6. Efficiently modeling long sequences with structured state spaces. In International Conference on Learning Representations.
  7. How to train your HIPPO: State space models with generalized orthogonal basis projections. In International Conference on Learning Representations.
  8. Diagonal state spaces are as effective as structured state spaces. In Advances in Neural Information Processing Systems.
  9. Liquid structural state-space models. In The Eleventh International Conference on Learning Representations.
  10. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, 9(8):1735–1780.
  11. Michael I Jordan. 1997. Serial order: A parallel distributed processing approach. In Advances in psychology, volume 121, pages 471–495. Elsevier.
  12. Transformers learn shortcuts to automata. In International Conference on Learning Representations.
  13. Eric Martin and Chris Cundy. 2018. Parallelizing linear recurrent neural nets over sequence length. In International Conference on Learning Representations.
  14. Resurrecting recurrent neural networks for long sequences. In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 26670–26698. PMLR.
  15. Rwkv: Reinventing rnns for the transformer era. arXiv preprint arXiv:2305.13048.
  16. Hyena hierarchy: Towards larger convolutional language models. In International Conference on Machine Learning, pages 28043–28078. PMLR.
  17. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
  18. Simplified state space layers for sequence modeling. In The Eleventh International Conference on Learning Representations.
  19. Benchmarking compositionality with formal languages. In Proceedings of the 29th International Conference on Computational Linguistics, pages 6007–6018, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
  20. Attention is all you need. Advances in neural information processing systems, 30.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com