On the Tensor Representation and Algebraic Homomorphism of the Neural State Turing Machine (2309.14690v1)
Abstract: Recurrent neural networks (RNNs) and transformers have been shown to be Turing-complete, but this result assumes infinite precision in their hidden representations, positional encodings for transformers, and unbounded computation time in general. In practical applications, however, it is crucial to have real-time models that can recognize Turing complete grammars in a single pass. To address this issue and to better understand the true computational power of artificial neural networks (ANNs), we introduce a new class of recurrent models called the neural state Turing machine (NSTM). The NSTM has bounded weights and finite-precision connections and can simulate any Turing Machine in real-time. In contrast to prior work that assumes unbounded time and precision in weights, to demonstrate equivalence with TMs, we prove that a $13$-neuron bounded tensor RNN, coupled with third-order synapses, can model any TM class in real-time. Furthermore, under the Markov assumption, we provide a new theoretical bound for a non-recurrent network augmented with memory, showing that a tensor feedforward network with $25$th-order finite precision weights is equivalent to a universal TM.
- The foundation of the general theory of relativity. Ann. Der Phys 49 (1916), 769–822.
- Turing completeness of bounded-precision recurrent neural networks. Advances in Neural Information Processing Systems 34 (2021).
- Cybenko, G. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems 2, 4 (1989), 303–314.
- Neural networks and the chomsky hierarchy. arXiv preprint arXiv:2207.02098 (2022).
- How can self-attention networks recognize dyck-n languages? CoRR abs/2010.04303 (2020).
- Einstein, A. Die grundlage der allgemeinen relativitätstheorie [adp 49, 769 (1916)]. Annalen der Physik 14, S1 1 (1916), 517–571.
- Neural turing machines. arXiv preprint arXiv:1410.5401 (2014).
- Hybrid computing using a neural network with dynamic external memory. Nature 538, 7626 (2016), 471.
- Learning to transduce with unbounded memory. In Advances in neural information processing systems (2015), pp. 1828–1836.
- Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.
- Introduction to automata theory, languages, and computation. Acm Sigact News 32, 1 (2001), 60–65.
- Inferring algorithmic patterns with stack-augmented recurrent nets. In Advances in neural information processing systems (2015), pp. 190–198.
- Training recurrent neural networks via forward propagation through time. In International Conference on Machine Learning (2021), PMLR, pp. 5189–5200.
- Critical behavior from deep dynamics: a hidden dimension in natural language. arXiv preprint arXiv:1606.06737 (2016).
- Recognizing long grammatical sequences using recurrent networks augmented with an external differentiable stack. In International Conference on Grammatical Inference (2021), PMLR, pp. 130–153.
- Recognizing and verifying mathematical equations using multiplicative differential neural units. In Proceedings of the AAAI Conference on Artificial Intelligence (2021), vol. 35, pp. 5006–5015.
- A neural state pushdown automata. IEEE Transactions on Artificial Intelligence 1, 3 (2020), 193–205.
- A practical sparse approximation for real time recurrent learning. arXiv preprint arXiv:2006.07232 (2020).
- Four small universal turing machines. In International Conference on Machines, Computations, and Universality (2007), Springer, pp. 242–254.
- On the turing completeness of modern neural network architectures. arXiv preprint arXiv:1901.03429 (2019).
- Puterman, M. L. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
- Attention is turing-complete. Journal of Machine Learning Research 22, 75 (2021), 1–35.
- Analog computation via neural networks. Theor. Comput. Sci. 131, 2 (1994), 331–360.
- On the computational power of neural nets. J. Comput. Syst. Sci. 50, 1 (1995), 132–150.
- Provably stable interpretable encodings of context free grammars in rnns with a differentiable stack. arXiv preprint arXiv:2006.03651 (2020).
- The neural network pushdown automaton: Architecture, dynamics and training. Springer Berlin Heidelberg, Berlin, Heidelberg, 1998, pp. 296–345.
- Memory-augmented recurrent neural networks can learn generalized dyck languages, 2019.
- Unbiased online recurrent optimization. CoRR abs/1702.05043 (2017).
- Attention is all you need. Advances in neural information processing systems 30 (2017).
- A learning algorithm for continually running fully recurrent neural networks. Neural Computation 1, 2 (June 1989), 270–280.
- Gradient-based learning algorithms for recurrent networks and their computational complexity. Backpropagation: Theory, architectures, and applications 433 (1995).
- Understanding straight-through estimator in training activation quantized neural nets. In International Conference on Learning Representations (2019).
- Discrete recurrent neural networks for grammatical inference. IEEE Transactions on Neural Networks 5, 2 (1994), 320–330.