2000 character limit reached
How Powerful are Decoder-Only Transformer Neural Models? (2305.17026v4)
Published 26 May 2023 in cs.CL and cs.LG
Abstract: In this article we prove that the general transformer neural model undergirding modern LLMs is Turing complete under reasonable assumptions. This is the first work to directly address the Turing completeness of the underlying technology employed in GPT-x as past work has focused on the more expressive, full auto-encoder transformer architecture. From this theoretical analysis, we show that the sparsity/compressibility of the word embedding is an important consideration for Turing completeness to hold. We also show that Transformers are are a variant of B machines studied by Hao Wang.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
- J. Wei, X. Wang, D. Schuurmans, M. Bosma, E. Chi, Q. Le, and D. Zhou, “Chain of thought prompting elicits reasoning in large language models,” arXiv preprint arXiv:2201.11903, 2022.
- M. C. Rillig, M. Ågerstrand, M. Bi, K. A. Gould, and U. Sauerland, “Risks and benefits of large language models for the environment,” Environmental Science & Technology, vol. 57, no. 9, pp. 3464–3466, 2023.
- P. J. Liu, M. Saleh, E. Pot, B. Goodrich, R. Sepassi, L. Kaiser, and N. Shazeer, “Generating wikipedia by summarizing long sequences,” arXiv preprint arXiv:1801.10198, 2018.
- A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, et al., “Improving language understanding by generative pre-training,” OpenAI blog, 2018.
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
- OpenAI, “Gpt-4 technical report,” ArXiv, vol. abs/2303.08774, 2023.
- A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, et al., “Language models are unsupervised multitask learners,” OpenAI blog, vol. 1, no. 8, p. 9, 2019.
- T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020.
- J. Pérez, J. Marinković, and P. Barceló, “On the turing completeness of modern neural network architectures,” arXiv preprint arXiv:1901.03429, 2019.
- S. Bhattamishra, A. Patel, and N. Goyal, “On the computational power of transformers and its implications in sequence modeling,” arXiv preprint arXiv:2006.09286, 2020.
- H. Wang, “A variant to turing’s theory of computing machines,” Journal of the ACM (JACM), vol. 4, no. 1, pp. 63–92, 1957.
- H. T. Siegelmann and E. D. Sontag, “On the computational power of neural nets,” in Proceedings of the fifth annual workshop on Computational learning theory, pp. 440–449, 1992.
- M. Hahn, “Theoretical limitations of self-attention in neural sequence models,” Transactions of the Association for Computational Linguistics, vol. 8, pp. 156–171, 2020.
- C. Yun, S. Bhojanapalli, A. S. Rawat, S. J. Reddi, and S. Kumar, “Are transformers universal approximators of sequence-to-sequence functions?,” arXiv preprint arXiv:1912.10077, 2019.
- K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural networks, vol. 2, no. 5, pp. 359–366, 1989.
- A. M. Turing, “Computability and λ𝜆\lambdaitalic_λ-definability,” The Journal of Symbolic Logic, vol. 2, no. 4, pp. 153–163, 1937.
- J. P. Neto, H. T. Siegelmann, J. F. Costa, and C. S. Araujo, “Turing universality of neural nets (revisited),” in Computer Aided Systems Theory—EUROCAST’97: A Selection of Papers from the 6th International Workshop on Computer Aided Systems Theory Las Palmas de Gran Canaria, Spain, February 24–28, 1997 Proceedings 6, pp. 361–366, Springer, 1997.
- D. Schuurmans, “Memory augmented large language models are computationally universal,” arXiv preprint arXiv:2301.04589, 2023.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.