The Topos of Transformer Networks (2403.18415v3)
Abstract: The transformer neural network has significantly out-shined all other neural network architectures as the engine behind LLMs. We provide a theoretical analysis of the expressivity of the transformer architecture through the lens of topos theory. From this viewpoint, we show that many common neural network architectures, such as the convolutional, recurrent and graph convolutional networks, can be embedded in a pretopos of piecewise-linear functions, but that the transformer necessarily lives in its topos completion. In particular, this suggests that the two network families instantiate different fragments of logic: the former are first order, whereas transformers are higher-order reasoners. Furthermore, we draw parallels with architecture search and gradient descent, integrating our analysis in the framework of cybernetic agents.
- Amina Adadi & Mohammed Berrada (2018): Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE access 6, pp. 52138–52160.
- Jiří Adámek & Jiří Rosickỳ (2020): How nice are free completions of categories? Topology and its Applications 273, p. 106972.
- Adebowale Jeremy Adetayo, Mariam Oyinda Aborisade & Basheer Abiodun Sanni (2024): Microsoft Copilot and Anthropic Claude AI in education and library service. Library Hi Tech News.
- arXiv preprint arXiv:1611.01491.
- Caglar Aytekin (2022): Neural Networks are Decision Trees. arXiv preprint arXiv:2210.05189.
- Randall Balestriero et al. (2018): A spline theory of deep learning. In: International Conference on Machine Learning, PMLR, pp. 374–383.
- Topological, Algebraic and Geometric Learning Workshops 2022.
- NeurIPS 2022 Workshop on Symmetry and Geometry in Neural Representations.
- Jean-Claude Belfiore & Daniel Bennequin (2021): Topos and stacks of deep neural networks. arXiv preprint arXiv:2106.14587.
- arXiv preprint arXiv:2202.04579.
- Guillaume Boisseau & Robin Piedeleu (2022): Graphical piecewise-linear algebra. In: Foundations of Software Science and Computation Structures: 25th International Conference, FOSSACS 2022, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022, Munich, Germany, April 2–7, 2022, Proceedings, Springer International Publishing Cham, pp. 101–119.
- arXiv preprint arXiv:2104.13478.
- Ruth MJ Byrne (2019): Counterfactuals in Explainable Artificial Intelligence (XAI): Evidence from Human Reasoning. In: IJCAI, pp. 6276–6282.
- arXiv preprint arXiv:2105.06332.
- In: Programming Languages and Systems: 31st European Symposium on Programming, ESOP 2022, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022, Munich, Germany, April 2–7, 2022, Proceedings, Springer International Publishing Cham, pp. 1–28.
- Glenn De’Ath (2002): Multivariate regression trees: a new technique for modeling species–environment relationships. Ecology 83(4), pp. 1105–1117.
- Andrew Dudzik & Petar Veličković (2022): Graph neural networks are dynamic programmers. arXiv preprint arXiv:2203.15544.
- Brendan Fong, David Spivak & Rémy Tuyéras (2019): Backprop as functor: A compositional perspective on supervised learning. In: 2019 34th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), IEEE, pp. 1–13.
- arXiv preprint arXiv:2402.15332.
- Pim de Haan, Taco S Cohen & Max Welling (2020): Natural graph networks. Advances in neural information processing systems 33, pp. 3636–3646.
- arXiv preprint arXiv:1807.03973.
- Advances in Neural Information Processing Systems 34, pp. 3336–3348.
- Sepp Hochreiter & Jürgen Schmidhuber (1997): Long short-term memory. Neural computation 9(8), pp. 1735–1780.
- Thomas N Kipf & Max Welling (2016): Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907.
- Applicable Algebra in Engineering, Communication and Computing, pp. 1–16.
- Proceedings of the IEEE 86(11), pp. 2278–2324.
- Tom Leinster (2004): Higher operads, higher categories. 298, Cambridge University Press.
- Tom Leinster (2016): Basic category theory. arXiv preprint arXiv:1612.09375.
- Scott M Lundberg & Su-In Lee (2017): A unified approach to interpreting model predictions. Advances in neural information processing systems 30.
- Minh-Thang Luong, Hieu Pham & Christopher D Manning (2015): Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025.
- In: Artificial General Intelligence: 14th International Conference, AGI 2021, Palo Alto, CA, USA, October 15–18, 2021, Proceedings 14, Springer, pp. 127–138.
- Advances in neural information processing systems 27.
- Michael Moy, Robert Cardona & Alan Hylton (2023): Categories of Neural Networks. In: 2023 IEEE Cognitive Communications for Aerospace Applications Workshop (CCAAW), IEEE, pp. 1–9.
- IEEE Signal Processing Magazine 39(4), pp. 73–84.
- Andrew M Pitts (2001): Categorical logic. Handbook of logic in computer science 5, pp. 39–128.
- Marco Tulio Ribeiro, Sameer Singh & Carlos Guestrin (2016): Model-agnostic interpretability of machine learning. arXiv preprint arXiv:1606.05386.
- David E Rumelhart, Geoffrey E Hinton & Ronald J Williams (1986): Learning representations by back-propagating errors. nature 323(6088), pp. 533–536.
- arXiv preprint arXiv:2301.08013.
- In: Proceedings of the IEEE international conference on computer vision, pp. 618–626.
- Eduardo Sontag (1982): Remarks on piecewise-linear algebra. Pacific Journal of Mathematics 98(1), pp. 183–201.
- David I Spivak (2021): Learners’ Languages. arXiv preprint arXiv:2103.01189.
- David I Spivak & Timothy Hosgood (2021): Deep neural networks as nested dynamical systems. arXiv preprint arXiv:2111.01297.
- arXiv preprint arXiv:2011.04041.
- arXiv preprint arXiv:1905.13405.
- Advances in neural information processing systems 30.
- Mattia Jacopo Villani & Peter McBurney (2023): Unwrapping All ReLU Networks. arXiv:https://arxiv.org/abs/2305.09424.
- Mattia Jacopo Villani & Nandi Schoots (2023): Any Deep ReLU Network is Shallow. arXiv preprint arXiv:2306.11827.
- In: International Conference on Machine Learning, PMLR, pp. 35151–35174.
- IEEE/CAA Journal of Automatica Sinica 10(5), pp. 1122–1136.
- The AAAI-22 Workshop on Adversarial Machine Learning and Beyond.
- Erik Christopher Zeeman (1963): Seminar on combinatorial topology. Institut des hautes etudes scientifiques.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.