Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Tensor Representation and Algebraic Homomorphism of the Neural State Turing Machine (2309.14690v1)

Published 26 Sep 2023 in cs.CC

Abstract: Recurrent neural networks (RNNs) and transformers have been shown to be Turing-complete, but this result assumes infinite precision in their hidden representations, positional encodings for transformers, and unbounded computation time in general. In practical applications, however, it is crucial to have real-time models that can recognize Turing complete grammars in a single pass. To address this issue and to better understand the true computational power of artificial neural networks (ANNs), we introduce a new class of recurrent models called the neural state Turing machine (NSTM). The NSTM has bounded weights and finite-precision connections and can simulate any Turing Machine in real-time. In contrast to prior work that assumes unbounded time and precision in weights, to demonstrate equivalence with TMs, we prove that a $13$-neuron bounded tensor RNN, coupled with third-order synapses, can model any TM class in real-time. Furthermore, under the Markov assumption, we provide a new theoretical bound for a non-recurrent network augmented with memory, showing that a tensor feedforward network with $25$th-order finite precision weights is equivalent to a universal TM.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. The foundation of the general theory of relativity. Ann. Der Phys 49 (1916), 769–822.
  2. Turing completeness of bounded-precision recurrent neural networks. Advances in Neural Information Processing Systems 34 (2021).
  3. Cybenko, G. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems 2, 4 (1989), 303–314.
  4. Neural networks and the chomsky hierarchy. arXiv preprint arXiv:2207.02098 (2022).
  5. How can self-attention networks recognize dyck-n languages? CoRR abs/2010.04303 (2020).
  6. Einstein, A. Die grundlage der allgemeinen relativitätstheorie [adp 49, 769 (1916)]. Annalen der Physik 14, S1 1 (1916), 517–571.
  7. Neural turing machines. arXiv preprint arXiv:1410.5401 (2014).
  8. Hybrid computing using a neural network with dynamic external memory. Nature 538, 7626 (2016), 471.
  9. Learning to transduce with unbounded memory. In Advances in neural information processing systems (2015), pp. 1828–1836.
  10. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.
  11. Introduction to automata theory, languages, and computation. Acm Sigact News 32, 1 (2001), 60–65.
  12. Inferring algorithmic patterns with stack-augmented recurrent nets. In Advances in neural information processing systems (2015), pp. 190–198.
  13. Training recurrent neural networks via forward propagation through time. In International Conference on Machine Learning (2021), PMLR, pp. 5189–5200.
  14. Critical behavior from deep dynamics: a hidden dimension in natural language. arXiv preprint arXiv:1606.06737 (2016).
  15. Recognizing long grammatical sequences using recurrent networks augmented with an external differentiable stack. In International Conference on Grammatical Inference (2021), PMLR, pp. 130–153.
  16. Recognizing and verifying mathematical equations using multiplicative differential neural units. In Proceedings of the AAAI Conference on Artificial Intelligence (2021), vol. 35, pp. 5006–5015.
  17. A neural state pushdown automata. IEEE Transactions on Artificial Intelligence 1, 3 (2020), 193–205.
  18. A practical sparse approximation for real time recurrent learning. arXiv preprint arXiv:2006.07232 (2020).
  19. Four small universal turing machines. In International Conference on Machines, Computations, and Universality (2007), Springer, pp. 242–254.
  20. On the turing completeness of modern neural network architectures. arXiv preprint arXiv:1901.03429 (2019).
  21. Puterman, M. L. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
  22. Attention is turing-complete. Journal of Machine Learning Research 22, 75 (2021), 1–35.
  23. Analog computation via neural networks. Theor. Comput. Sci. 131, 2 (1994), 331–360.
  24. On the computational power of neural nets. J. Comput. Syst. Sci. 50, 1 (1995), 132–150.
  25. Provably stable interpretable encodings of context free grammars in rnns with a differentiable stack. arXiv preprint arXiv:2006.03651 (2020).
  26. The neural network pushdown automaton: Architecture, dynamics and training. Springer Berlin Heidelberg, Berlin, Heidelberg, 1998, pp. 296–345.
  27. Memory-augmented recurrent neural networks can learn generalized dyck languages, 2019.
  28. Unbiased online recurrent optimization. CoRR abs/1702.05043 (2017).
  29. Attention is all you need. Advances in neural information processing systems 30 (2017).
  30. A learning algorithm for continually running fully recurrent neural networks. Neural Computation 1, 2 (June 1989), 270–280.
  31. Gradient-based learning algorithms for recurrent networks and their computational complexity. Backpropagation: Theory, architectures, and applications 433 (1995).
  32. Understanding straight-through estimator in training activation quantized neural nets. In International Conference on Learning Representations (2019).
  33. Discrete recurrent neural networks for grammatical inference. IEEE Transactions on Neural Networks 5, 2 (1994), 320–330.
Citations (2)

Summary

We haven't generated a summary for this paper yet.