The categorical contours of the Chomsky-Schützenberger representation theorem (2405.14703v2)
Abstract: We develop fibrational perspectives on context-free grammars and on nondeterministic finite-state automata over categories and operads. A generalized CFG is a functor from a free colored operad (aka multicategory) generated by a pointed finite species into an arbitrary base operad: this encompasses classical CFGs by taking the base to be a certain operad constructed from a free monoid, as an instance of a more general construction of an \emph{operad of spliced arrows} $\mathcal{W}\,\mathcal{C}$ for any category $\mathcal{C}$. A generalized NFA is a functor from an arbitrary bipointed category or pointed operad satisfying the unique lifting of factorizations and finite fiber properties: this encompasses classical word automata and tree automata without $\epsilon$-transitions, but also automata over non-free categories and operads. We show that generalized context-free and regular languages satisfy suitable generalizations of many of the usual closure properties, and in particular we give a simple conceptual proof that context-free languages are closed under intersection with regular languages. Finally, we observe that the splicing functor $\mathcal{W} : Cat \to Oper$ admits a left adjoint $\mathcal{C}: Oper \to Cat$, which we call the \emph{contour category} construction since the arrows of $\mathcal{C}\,\mathcal{O}$ have a geometric interpretation as oriented contours of operations of $\mathcal{O}$. A direct consequence of the contour / splicing adjunction is that every pointed finite species induces a universal CFG generating a language of \emph{tree contour words.} This leads us to a generalization of the Chomsky-Sch\"utzenberger Representation Theorem, establishing that a subset of a homset $L \subseteq \mathcal{C}(A,B)$ is a CFL of arrows if and only if it is a functorial image of the intersection of a $\mathcal{C}$-chromatic tree contour language with a regular language.
- Displayed categories. Logical Methods in Computer Science, 15(1), 2019. doi:10.23638/LMCS-15(1:20)2019.
- Jean Bénabou. Distributors at work. Notes from a course at TU Darmstadt in June 2000, taken by Thomas Streicher, 2000. URL: https://www2.mathematik.tu-darmstadt.de/~streicher/FIBR/DiWo.pdf.
- M.A. Bednarczyk. Categories of asynchronous systems. PhD thesis, University of Sussex, 1988.
- Unique factorisation lifting functors and categories of linearly-controlled processes. Mathematical Structures in Computer Science, 10:137–163, 2000. doi:10.1017/S0960129599003023.
- On formal properties of simple phrase structure grammars. Z. Phonetik, Sprachwissen. Kommun., 14(2):143–172, 1961. doi:doi:10.1524/stuf.1961.14.14.143.
- Cohomology of monoids in monoidal categories. Contemporary Mathematics, 202, 1997. doi:10.1090/conm/202/02597.
- Combinatorial Species and Tree-Like Structures. Cambridge University Press, 1998. Translated by Margaret Readdy. doi:10.1017/CBO9781107325913.
- Unfolding synthesis of asynchronous automata. In Dima Grigoriev, John Harrison, and Edward A. Hirsch, editors, Computer Science – Theory and Applications, pages 46–57, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg.
- Exponentiability and single universes. Journal of Pure and Applied Algebra, 148:217–250, 2000. doi:10.1016/S0022-4049(98)00172-8.
- Tree Automata Techniques and Applications, 2008. URL: https://hal.inria.fr/hal-03367725.
- Graph Structure and Monadic Second-Order Logic - A Language-Theoretic Approach, volume 138 of Encyclopedia of mathematics and its applications. Cambridge University Press, 2012. doi:10.1017/CBO9780511977619.
- Noam Chomsky. Context-free grammars and push-down storage. Quarterly Progress Report 65, Research Laboratory of Electronics, M.I.T., 1962. URL: http://hdl.handle.net/1721.1/53697.
- John Horton Conway. Regular Algebra and Finite Machines. Chapman and Hall, 1971. Reprinted in Dover Books on Mathematics, 2012.
- Automata minimization: a functorial approach. Logical Methods in Computer Science, 16(1):32:1–32:28, 2020. doi:10.23638/LMCS-16(1:32)2020.
- N. Chomsky and M. P. Schützenberger. The algebraic theory of context-free languages. In P. Braffort and D. Hirschberg, editors, Computer Programming and Formal Systems, volume 35 of Studies in Logic and the Foundations of Mathematics, pages 118–161. North-Holland, 1963. doi:10.1016/S0049-237X(08)72023-8.
- Stefano Crespi-Reghizzi and Pierluigi San Pietro. An enduring trail of language characterizations via homomorphism (talk), March 21–25 2016. Conference dedicated to the scientific legacy of M. P. Schützenberger (Bordeaux). URL: https://mps2016.labri.fr/archives/crespi.pdf.
- Philippe de Groote. Towards abstract categorial grammars. In Association for Computational Linguistic, 39th Annual Meeting and 10th Conference of the European Chapter, Proceedings of the Conference, July 9-11, 2001, Toulouse, France, pages 148–155. Morgan Kaufmann Publishers, 2001. doi:10.3115/1073012.1073045.
- On the expressive power of abstract categorial grammars: Representing context-free formalisms. J. Log. Lang. Inf., 13(4):421–438, 2004. doi:10.1007/s10849-004-2114-x.
- From petri nets to automata with concurrency. Appl. Categorical Struct., 10(2):173–191, 2002. doi:10.1023/A:1014305610452.
- Quantum categories, star autonomy, and quantum groupoids. In Galois Theory, Hopf Algebras, and Semiabelian Categories, pages 187–225. American Mathematical Society, 2004.
- Jay Earley. An efficient context-free parsing algorithm. Commun. ACM, 13(2):94–102, 1970. doi:10.1145/362007.362035.
- The produoidal algebra of process decomposition. CoRR, abs/2301.11867, 2023. arXiv:2301.11867, doi:10.48550/ARXIV.2301.11867.
- Samuel Eilenberg. Automata, Languages, and Machines: volume A. Pure and applied mathematics. Academic Press, 1974.
- Regular monoidal languages. In Stefan Szeider, Robert Ganian, and Alexandra Silva, editors, 47th International Symposium on Mathematical Foundations of Computer Science, MFCS 2022, August 22-26, 2022, Vienna, Austria, volume 241 of LIPIcs, pages 44:1–44:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPICS.MFCS.2022.44.
- The cartesian closed bicategory of generalised species of structures. Journal of the London Mathematical Society, 77(1):203–220, 2008. doi:10.1112/jlms/jdm096.
- A Kleene theorem for higher-dimensional automata. In Bartek Klin, Slawomir Lasota, and Anca Muscholl, editors, 33rd International Conference on Concurrency Theory, CONCUR 2022, September 12-16, 2022, Warsaw, Poland, volume 243 of LIPIcs, pages 29:1–29:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022. URL: https://doi.org/10.4230/LIPIcs.CONCUR.2022.29, doi:10.4230/LIPICS.CONCUR.2022.29.
- Infinitary axiomatization of the equational theory of context-free languages. Fundam. Informaticae, 150(3-4):241–257, 2017. doi:10.3233/FI-2017-1469.
- Formal relationships between geometrical and classical models for concurrency. In Lisbeth Fajstrup, Eric Goubault, and Martin Raussen, editors, Proceedings of the workshop on Geometric and Topological Methods in Computer Science, GETCO 2010, Aalborg, Denmark, January 11-15, 2010, volume 283 of Electronic Notes in Theoretical Computer Science, pages 77–109. Elsevier, 2010. URL: https://doi.org/10.1016/j.entcs.2012.05.007, doi:10.1016/J.ENTCS.2012.05.007.
- Two families of languages related to ALGOL. Journal of the ACM, 9(3):350–371, 1962. doi:10.1145/321127.321132.
- Léonard Guetta. Polygraphs and discrete Conduché ω𝜔\omegaitalic_ω-functors. Higher Structures, 4(2):134–166, 2020. doi:10.21136/HS.2020.11.
- Claudio Hermida. Representable multicategories. Advances in Mathematics, 151(2):164–225, 2000. doi:10.1006/aima.1999.1877.
- Claudio Hermida. Fibrations for abstract multicategories. In George Janelidze, Bodo Pareigis, and Walter Tholen, editors, Galois Theory, Hopf Algebras, and Semiabelian Categories, volume 43 of Fields Institute Communications. American Mathematical Society, 2004. doi:10.1090/fic/043/11.
- On recognizable stable trace languages. In Jerzy Tiuryn, editor, Foundations of Software Science and Computation Structures, Third International Conference, FOSSACS 2000, Held as Part of the Joint European Conferences on Theory and Practice of Software,ETAPS 2000, Berlin, Germany, March 25 - April 2, 2000, Proceedings, volume 1784 of Lecture Notes in Computer Science, pages 177–191. Springer, 2000. doi:10.1007/3-540-46432-8\_12.
- Introduction to automata theory, languages, and computation, 3rd Edition. Pearson international edition. Addison-Wesley, 2007.
- Martin Hyland. Abstract and concrete models for recursion. In O. Grumberg, T. Nipkow, and C. Pfaller, editors, Proceedings of the NATO Advanced Study Institute on Formal Logical Methods for System Security and Correctness, pages 175–198. IOS Press, 2008.
- Peter Johnstone. A note on discrete Conduché fibrations. Theory and Applications of Categories, 5(1):1–11, 1999.
- Peter R Jones. Profinite categories, implicit operations and pseudovarieties of categories. Journal of Pure and Applied Algebra, 109(1):61–95, 1996. doi:10.1016/0022-4049(95)00074-7.
- André Joyal. Une théorie combinatoire des séries formelles. Advances in Mathematics, 42(1):1–82, 1981. doi:10.1016/0001-8708(81)90052-9.
- André Joyal. Foncteurs analytiques et espèces de structures. In Gilbert Labelle and Pierre Leroux, editors, Combinatoire énumérative, Lecture Notes in Mathematics, pages 126–159, Berlin, Heidelberg, 1986. Springer Berlin Heidelberg. doi:10.1007/BFb0072514.
- Maps, hypermaps, and triangle groups. In L. Schneps, editor, The Grothendieck Theory of Dessins d’Enfants, number 200 in London Mathematical Society Lecture Note Series. Cambridge University Press, 1994. doi:10.1017/CBO9780511569302.006.
- Donald E. Knuth. On the translation of languages from left to right. Information and Control, 8(6):607–639, 1965. doi:10.1016/S0019-9958(65)90426-2.
- Dexter Kozen. Automata and Computability. Undergrad. texts in comp. science. Springer, 1997.
- Joachim Lambek. Multicategories revisited. Contemporary Mathematics, 92:217–239, 1989. doi:10.1090/conm/092.
- F. W. Lawvere. State categories and response functors. Unpublished, May 1986.
- René Leermakers. How to cover a grammar. In Julia Hirschberg, editor, 27th Annual Meeting of the Association for Computational Linguistics, 26-29 June 1989, University of British Columbia, Vancouver, BC, Canada, Proceedings, pages 135–142. ACL, 1989. doi:10.3115/981623.981640.
- Tom Leinster. Higher Operads, Higher Categories, volume 298 of London Mathematical Society Lecture Note Series. Cambridge University Press, 2004. doi:10.1017/CBO9780511525896.
- To CNF or not to CNF? An efficient yet presentable version of the CYK algorithm. Informatica Didact., 8, 2009. URL: https://www.informaticadidactica.de/index.php?page=LangeLeiss2009_en.
- F. W. Lawvere and M. Menni. The Hopf algebra of Möbius intervals. Theory and Applications of Categories, 24(10):221–265, 2010. URL: http://www.tac.mta.ca/tac/volumes/24/10/24-10abs.html.
- Multityped abstract categorial grammars and their composition. In Agata Ciabattoni, Elaine Pimentel, and Ruy J. G. B. de Queiroz, editors, Logic, Language, Information, and Computation, pages 105–122, 2022. doi:10.1007/978-3-031-15298-6_7.
- Saunders Mac Lane. Categories for the Working Mathematician. Springer, 1998.
- Antoni Mazurkiewicz. Basic notions of trace theory. In J. W. de Bakker, W. P. de Roever, and G. Rozenberg, editors, Linear Time, Branching Time and Partial Order in Logics and Models for Concurrency, pages 285–363, Berlin, Heidelberg, 1989. Springer Berlin Heidelberg.
- Paul-André Melliès. Asynchronous template games and the Gray tensor product of 2-categories. In Proceedings of the 36th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2021, Roma, Italy, 2021, 2021.
- Asynchronous games: innocence without alternation. In Proceedings of the 18th International Conference on Concurrency Theory, CONCUR 2007, volume 4703 of LNCS, pages 395–411. Springer Verlag, 2007.
- Syntactically and semantically regular languages of lambda-terms coincide through logical relations. In 32nd EACSL Annual Conference on Computer Science Logic, CSL 2024, February 19-23, 2024, Naples, Italy, LIPIcs. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2024.
- Rémi Morin. Concurrent automata vs. asynchronous systems. In Joanna Jedrzejowicz and Andrzej Szepietowski, editors, Mathematical Foundations of Computer Science 2005, 30th International Symposium, MFCS 2005, Gdansk, Poland, August 29 - September 2, 2005, Proceedings, volume 3618 of Lecture Notes in Computer Science, pages 686–698. Springer, 2005. doi:10.1007/11549345\_59.
- Operads in Algebra, Topology and Physics, volume 96 of Mathematical Surveys and Monographs. American Mathematical Society, 2002. doi:10.1090/surv/096.
- Algebraic automata and context-free sets. Information and Control, 11(1/2):3–29, 1967. doi:10.1016/S0019-9958(67)90353-1.
- Type refinement and monoidal closed bifibrations. Unpublished, arXiv:1310.0263, October 2013. URL: https://arxiv.org/abs/1310.0263.
- Functors are type refinement systems. In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 3–16. ACM, 2015. doi:10.1145/2676726.2676970.
- A bifibrational reconstruction of Lawvere’s presheaf hyperdoctrine. In Proceedings of the 31st Annual ACM/IEEE Symposium on Logic in Computer Science, pages 555–564. ACM, 2016. doi:10.1145/2933575.2934525.
- An Isbell duality theorem for type refinement systems. Mathematical Structures in Computer Science, 28(6):736–774, 2018. doi:10.1017/S0960129517000068.
- Parsing as a lifting problem and the Chomsky-Schützenberger representation theorem. In MFPS 2022 - 38th conference on Mathematical Foundations for Programming Semantics, July 2022. doi:10.46298/entics.10508.
- Robert Paré. Mealy morphisms of enriched categories. Applied Categorical Structures, 20(3):251–273, 2012. doi:10.1007/S10485-010-9238-8.
- Mario Román. Monoidal Context Theory. PhD thesis, Tallinn University of Technology, 2023.
- Kimmo I. Rosenthal. Quantaloids, enriched categories and automata theory. Applied Categorical Structures, 3(3):279–301, 1995. doi:10.1007/BF00878445.
- Finite automata and their decision problems. IBM Journal of Research and Development, 3(2):114–125, 1959.
- Sylvain Salvati. Recognizability in the simply typed lambda-calculus. In WoLLIC, volume 5514 of LNCS, pages 48–60, 2009. doi:1007/978-3-642-02261-6_5.
- M. P. Schützenberger. On context-free languages and push-down automata. Information and control, 6(3):246–264, 1963.
- Gilles Schaeffer. Planar maps. In Miklós Bóna, editor, Handbook of Enumerative Combinatorics, pages 335–396. CRC, 2015. URL: http://www.lix.polytechnique.fr/~schaeffe/Master/HB.pdf.
- Michael Shulman. LNL polycategories and doctrines of linear logic. Logical Methods in Computer Science, 19(2):1:1–1:54. doi:10.46298/lmcs-19(2:1)2023.
- Michael Sipser. Introduction to the theory of computation, 3rd edition. Cengage Learning, 2013.
- Sergey Slavnov. Classical linear logic, cobordisms and categorial grammars, 2020. arXiv:1911.03962. URL: https://arxiv.org/abs/1911.03962.
- On multiple context-free grammars. Theoretical Computer Science, 88(2):191–229, 1991. doi:https://doi.org/10.1016/0304-3975(91)90374-B.
- Parsing Theory - Volume I: Languages and Parsing, volume 15 of EATCS Monographs on Theoretical Computer Science. Springer, 1988. doi:10.1007/978-3-642-61345-6.
- Benjamin Steinberg. Finite state automata: a geometric approach. Transactions of the American Mathematical Society, 353(9):3409–3464, 2001.
- Ross Street. Categorical structures. In M. Hazewinkel, editor, Handbook of Algebra, volume 1, pages 529–577. North-Holland, 1996.
- G. Shabat and V. Voevodsky. Drawing curves over number fields. In P. Cartier, Luc Illusie, Nicholas M. Katz, Gérard Laumon, Yuri I. Manin, and Kenneth A. Ribet, editors, The Grothendieck festschrift III, number 88 in Progress in Mathematics, pages 199–227. Birkhäuser, 1990. URL: https://www.math.ias.edu/vladimir/sites/math.ias.edu.vladimir/files/drawing_curves_published.pdf.
- Finite categories and regular languages. In Mathematical Problems in Computation Theory, volume 21 of Banach Center Publications, pages 395–402. 1988.
- R. F. C. Walters. A note on context-free languages. Journal of Pure and Applied Algebra, 62(2):199–203, 1989. doi:10.1016/0022-4049(89)90151-5.
- Varieties of finite categories. Informatique théorique et applications, 20(3):357–366, 1986.
- Wieslaw Zielonka. Notes on finite asynchronous automata. RAIRO – Theoretical Informatics and Applications, 21:99–135, 1987.