Trustworthy Formal Natural Language Specifications (2310.03885v1)
Abstract: Interactive proof assistants are computer programs carefully constructed to check a human-designed proof of a mathematical claim with high confidence in the implementation. However, this only validates truth of a formal claim, which may have been mistranslated from a claim made in natural language. This is especially problematic when using proof assistants to formally verify the correctness of software with respect to a natural language specification. The translation from informal to formal remains a challenging, time-consuming process that is difficult to audit for correctness. This paper shows that it is possible to build support for specifications written in expressive subsets of natural language, within existing proof assistants, consistent with the principles used to establish trust and auditability in proof assistants themselves. We implement a means to provide specifications in a modularly extensible formal subset of English, and have them automatically translated into formal claims, entirely within the Lean proof assistant. Our approach is extensible (placing no permanent restrictions on grammatical structure), modular (allowing information about new words to be distributed alongside libraries), and produces proof certificates explaining how each word was interpreted and how the sentence's structure was used to compute the meaning. We apply our prototype to the translation of various English descriptions of formal specifications from a popular textbook into Lean formalizations; all can be translated correctly with a modest lexicon with only minor modifications related to lexicon size.
- The Parallel Meaning Bank: Towards a Multilingual Corpus of Translations Annotated with Compositional Meaning Representations. In EACL.
- Kazimierz Ajdukiewicz. 1935. Die syntaktische konnexität. Studia Philosophica, 1: 1–27. Reprinted in Storrs McCall, ed., Polish Logic 1920–1939, 207–231.
- Hiyan Alshawi. 1992. The core language engine. MIT press.
- Hindi CCGbank: A CCG treebank from the Hindi dependency treebank. Language Resources and Evaluation 52, 1 (2018), 67–100.
- Andrew W Appel. 2001. Foundational proof-carrying code. In Proceedings of the 16th Annual IEEE Symposium on Logic in Computer Science (LICS 2001). IEEE, 247–256.
- Andrew W. Appel. 2022. Verified Functional Algorithms. Software Foundations, Vol. 3.
- Yoav Artzi. 2016. Cornell SPF: Cornell Semantic Parsing Framework. arXiv:arXiv:1311.3011
- Limits for learning with language models. In Proceedings of the 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023). Association for Computational Linguistics, Toronto, Canada, 236–248. https://doi.org/10.18653/v1/2023.starsem-1.22
- Jason Baldridge and Geert-Jan M. Kruijff. 2003. Multi-modal Combinatory Categorial Grammar. In Proceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics - Volume 1 (Budapest, Hungary) (EACL ’03). Association for Computational Linguistics, Stroudsburg, PA, USA, 211–218. https://doi.org/10.3115/1067807.1067836
- Bruce W. Ballard and Alan W. Biermann. 1979. Programming in Natural Language: “NLC” as a Prototype. In Proceedings of the 1979 Annual Conference (ACM ’79). Association for Computing Machinery, New York, NY, USA, 228–237. https://doi.org/10.1145/800177.810072
- Yehoshua Bar-Hillel. 1953. A quasi-arithmetical notation for syntactic description. Language 29, 1 (1953), 47–58.
- Chris Barker and Pauline Jacobson (Eds.). 2007. Direct compositionality. Oxford University Press.
- Chris Barker and Chung-chieh Shan. 2014. Continuations and natural language. Vol. 53. Oxford Studies in Theoretical.
- Daisuke Bekki. 2012. Dependent Type Semantics: An Introduction. In Logic and Interactive Rationality (LIRA) Yearbook 2012, Volume 1. 277–300.
- Daisuke Bekki. 2014. Representing Anaphora with Dependent Types. In Logical Aspects of Computational Linguistics - 8th International Conference, LACL 2014, Toulouse, France, June 18-20, 2014. Proceedings. 14–29. https://doi.org/10.1007/978-3-662-43742-1_2
- Emily M. Bender and Guy Emerson. 2021. Computational linguistics and grammar engineering. In Head-Driven Phrase Structure Grammar: The Handbook (Müller et al., 2021).
- An experimental study of natural language programming. International journal of man-machine studies 18, 1 (1983), 71–87.
- Stephen A Boxwell and Chris Brew. 2010. A Pilot Arabic CCGbank.. In LREC.
- David A Burke and Kristofer Johannisson. 2005. Translating formal software specifications to natural language. In International Conference on Logical Aspects of Computational Linguistics. Springer, 51–66.
- Bob Carpenter. 1997. Type-logical semantics. MIT press.
- Bob Carpenter. 1999. The Turing-completeness of multimodal categorial grammars. JFAK: Essays dedicated to Johan van Benthem on the occasion of his 50th birthday. Institute for Logic, Language, and Computation, University of Amsterdam. Available on CD-ROM at http://turing. wins. uva. nl (1999).
- Stergios Chatzikyriakidis and Zhaohui Luo. 2014. Natural Language Inference in Coq. Journal of Logic, Language, and Information 23 (2014). Issue 4.
- Stergios Chatzikyriakidis and Zhaohui Luo. 2017. On the interpretation of common nouns: Types versus predicates. In Modern perspectives in type-theoretical semantics. Springer, 43–70.
- Modern perspectives in type-theoretical semantics. Vol. 98. Springer.
- Koen Claessen and John Hughes. 2011. QuickCheck: a lightweight tool for random testing of Haskell programs. In In Proceedings of the Fifth ACM SIGPLAN International Conference on Functional Programming (ICFP).
- Stephen Clark and James R. Curran. 2003. Log-linear Models for Wide-coverage CCG Parsing. In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing (Conference on Empirical Methods on Natural Language Processing ’03). Association for Computational Linguistics, Stroudsburg, PA, USA, 97–104.
- Stephen Clark and James R. Curran. 2007. Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models. Computational Linguistics 33, 4 (Dec. 2007), 493–552.
- Building Deep Dependency Structures with a Wide-coverage CCG Parser. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (Philadelphia, Pennsylvania) (ACL ’02). Association for Computational Linguistics, Stroudsburg, PA, USA, 327–334. https://doi.org/10.3115/1073083.1073138
- Automatic Verification of Finite-state Concurrent Systems Using Temporal Logic Specifications. ACM Transactions on Programming Languages and Systems (TOPLAS) 8, 2 (1986), 244–263.
- William R Cook. 2007. Applescript. In Proceedings of the third ACM SIGPLAN conference on History of programming languages. 1–1.
- The naproche project controlled natural language proof checking of mathematical texts. In International Workshop on Controlled Natural Language. Springer, 170–186.
- Veronica Dahl. 1994. Natural language processing and logic programming. The Journal of Logic Programming 19 (1994), 681–714.
- The Isabelle/Naproche natural language proof assistant. In Automated Deduction–CADE 28: 28th International Conference on Automated Deduction, Virtual Event, July 12–15, 2021, Proceedings 28. Springer International Publishing, 614–624.
- What to do and how to do it: Translating natural language directives into temporal and dynamic logic representation for goal management and action execution. In 2009 IEEE International Conference on Robotics and Automation. IEEE, 4163–4168.
- Jason Eisner. 1996. Efficient Normal-Form Parsing for Combinatory Categorial Grammar. In 34th Annual Meeting of the Association for Computational Linguistics. 79–86.
- Allyson Ettinger. 2020. What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models. Transactions of the Association for Computational Linguistics 8 (2020), 34–48.
- Kate Finney. 1996. Mathematical notation in formal specification: Too difficult for the masses? IEEE Transactions on Software Engineering 22, 2 (1996), 158–159.
- Kate M Finney and Alex M Fedorec. 1996. An empirical study of specification readability. In Teaching and Learning Formal Methods. Academic Press.
- Norbert E Fuchs. 2009. Controlled Natural Language. In Workshop on Controlled Natural Language, CNL. Springer.
- Controlled natural language can replace first-order logic. In 14th IEEE International Conference on Automated Software Engineering. IEEE, 295–298.
- Norbert E Fuchs and Rolf Schwitter. 1996. Attempto Controlled English (ACE). In First International Workshop on Controlled Language Applications (CLAW) (University of Leuven, Belgium). http://attempto.ifi.uzh.ch/site/pubs/papers/CLAW96.ps
- Jager Gerhard et al. 2005. Anaphora and type logical grammar. Vol. 24. Springer Science & Business Media.
- Colin S. Gordon. 2022. Towards Property-Based Tests in Natural Language. In 44th IEEE/ACM International Conference on Software Engineering: New Ideas and Emerging Results, ICSE (NIER). IEEE. https://doi.org/10.1145/3510455.3512781
- Colin S. Gordon and Sergey Matskevich. 2022. Natural Language Specifications in Proof Assistants. Technical Report arXiv cs.PL 2205.07811. Computing Research Repository (CoRR). https://doi.org/10.48550/arXiv.2205.07811 arXiv:2205.07811
- Colin S. Gordon and Sergey Matskevich. 2023. Artifact for Trustworthy Formal Natural Language Specifications. https://doi.org/10.5281/zenodo.8329080
- Mike Gordon. 2000. From LCF to HOL: a short history. In Proof, Language, and Interaction: Essays in Honour of Robin Milner. 169–186.
- Little Tricky Logic: Misconceptions in the Understanding of LTL. The Art, Science, and Engineering of Programming 7, 2 (2022).
- Thomas Hallgren and Aarne Ranta. 2000. An extensible proof text editor. In International Conference on Logic for Programming Artificial Intelligence and Reasoning. Springer, 70–84.
- Christopher B Harris and Ian G Harris. 2015. Generating formal hardware verification properties from Natural Language documentation. In Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015). IEEE, 49–56.
- Michael Hess. 1985. How Does Natural Language Quantify?. In Second Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Geneva, Switzerland. https://aclanthology.org/E85-1002
- Julia Hockenmaier. 2006. Creating a CCGbank and a wide-coverage CCG lexicon for German. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 505–512.
- Julia Hockenmaier and Mark Steedman. 2005. CCGbank: User’s Manual. Technical Report.
- Julia Hockenmaier and Mark Steedman. 2007. CCGbank: a corpus of CCG derivations and dependency structures extracted from the Penn Treebank. Computational Linguistics 33, 3 (2007), 355–396.
- Pauline Jacobson. 1999. Towards a variable-free semantics. Linguistics and philosophy 22, 2 (1999), 117–185.
- Pauline I Jacobson. 2014. Compositional semantics: An introduction to the syntax/semantics interface. Oxford University Press.
- Kristofer Johannisson. 2007. Natural language specifications. In Verification of Object-Oriented Software. The KeY Approach. Springer, 317–333.
- The Convergence of mildly context-sensitive grammar formalisms. Technical Report MS-CIS-90-01. University of Pennsylvania (Philadelphia, PA US), Philadelphia. http://opac.inria.fr/record=b1042789
- Makoto Kanazawa. 1995. Learnable classes of categorial grammars. CSLI Publications, Stanford University.
- Martin Kay. 1996. Chart Generation. In 34th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Santa Cruz, California, USA, 200–204. https://doi.org/10.3115/981863.981890
- Oleg Kiselyov. 2015. Applicative abstract categorial grammars in full swing. In JSAI International Symposium on Artificial Intelligence. Springer, 66–78.
- Comprehensive Formal Verification of an OS Microkernel. ACM Trans. Comput. Syst. 32, 1, Article 2 (Feb. 2014), 70 pages. https://doi.org/10.1145/2560537
- Wen Kokke. 2015. Formalising type-logical grammar in Agda. In 1st Workshop on Type Theory and Lexical Semantics.
- Geert-Jan M. Kruijff and Jason Baldridge. 2000. Relating categorial type logics and CCG through simulation. https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.27.4152 Unpublished manuscript..
- Lexicalization and Generative Power in CCG. Computational Linguistics 41, 2 (2015), 187–219.
- Tobias Kuhn. 2014. A survey and classification of controlled natural languages. Computational linguistics 40, 1 (2014), 121–170.
- Lexical Generalization in CCG Grammar Induction for Semantic Parsing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (Edinburgh, United Kingdom) (Conference on Empirical Methods on Natural Language Processing ’11). Association for Computational Linguistics, Stroudsburg, PA, USA, 1512–1523.
- Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification.. In Conference on Empirical Methods on Natural Language Processing. ACL, 1223–1233.
- Joachim Lambek. 1958. The mathematics of sentence structure. The American Mathematical Monthly 65, 3 (1958), 154–170.
- Joachim Lambek. 1988. Categorial and categorical grammars. In Categorial grammars and natural language structures. Springer, 297–317.
- Xavier Leroy. 2009. A formally verified compiler back-end. Journal of Automated Reasoning 43, 4 (2009), 363–446.
- The Penn Treebank: annotating predicate argument structure. In Proceedings of the workshop on Human Language Technology. Association for Computational Linguistics, 114–119.
- Building a Large Annotated Corpus of English: The Penn Treebank. Comput. Linguist. 19, 2 (June 1993), 313–330. http://dl.acm.org/citation.cfm?id=972470.972475
- Per Martin-Löf and Giovanni Sambin. 1984. Intuitionistic type theory. Vol. 9. Bibliopolis Naples.
- Lance A Miller. 1981. Natural language programming: Styles, strategies, and contrasts. IBM Systems Journal 20, 2 (1981), 184–215.
- Building compositional semantics and higher-order inference system for a wide-coverage Japanese CCG parser. In EMNLP.
- Richard Montague. 1970a. English as a Formal Language. In Linguaggi nella societa e nella tecnica, Bruno Visentini (Ed.). Edizioni di Communita, 188–221.
- Richard Montague. 1970b. Universal grammar. Theoria 36, 3 (1970), 373–398.
- Richard Montague. 1973. The proper treatment of quantification in ordinary English. In Approaches to natural language. Springer, 221–242.
- Michael Moortgat. 1996a. Generalized quantifiers and discontinuous type constructors. In Discontinuous Constituency. NATURAL LANGUAGE PROCESSING, Vol. 6. Mouton de Gruyter, 181–208.
- Michael Moortgat. 1996b. Multimodal linguistic inference. Journal of Logic, Language and Information 5, 3 (01 Oct 1996), 349–385. https://doi.org/10.1007/BF00159344
- Michael Moortgat. 1999. Constants of grammatical reasoning. In Constraints and resources in natural language syntax and semantics. 195–219.
- Richard Moot. 2015. A type-logical treebank for French. Journal of Language Modelling 3, 1 (2015), 229–264.
- Richard Moot and Christian Retoré. 2012. The logic of categorial grammars: a deductive account of natural language syntax and semantics. Vol. 6850. Springer.
- Glyn Morrill. 1995. Discontinuity in categorial grammar. Linguistics and Philosophy 18, 2 (1995), 175–219.
- Glyn V Morrill. 2012. Type logical grammar: Categorial logic of signs. Springer Science & Business Media.
- Head-Driven Phrase Structure Grammar: The Handbook. Language Science Press.
- Crystal Nakatsu and Michael White. 2010. Generating with Discourse Combinatory Categorial Grammar. Linguistic Issues in Language Technology 4 (Sep. 2010). https://doi.org/10.33011/lilt.v4i.1221
- Rani Nelken and Nissim Francez. 1996. Automatic translation of natural language system specifications into temporal logic. In International Conference on Computer Aided Verification. Springer, 360–371.
- Categorial Grammars and Natural Language Structures. Springer.
- Lalchand Pandia and Allyson Ettinger. 2021. Sorting through the noise: Testing robustness of information processing in pre-trained language models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 1583–1596. https://aclanthology.org/2021.emnlp-main.119
- Barbara H Partee and Herman LW Hendriks. 1997. Montague grammar. In Handbook of logic and language. Elsevier, 5–91.
- Andrei Paskevich. 2007. The syntax and semantics of the ForTheL language. http://nevidal.org/download/forthel.pdf
- Christine Paulin-Mohring. 1993. Inductive definitions in the system coq rules and properties. In International Conference on Typed Lambda Calculi and Applications. Springer, 328–345.
- Lawrence C Paulson. 1990. Logic and computation: interactive proof with Cambridge LCF. Vol. 2. Cambridge University Press.
- Fernando CN Pereira and Stuart M Shieber. 1987. PROLOG and Natural Language Analysis.
- Fernando CN Pereira and David HD Warren. 1980. Definite clause grammars for language analysis—a survey of the formalism and a comparison with augmented transition networks. Artificial intelligence 13, 3 (1980), 231–278.
- Amir Pnueli. 1977. The Temporal Logic of Programs. In FOCS. IEEE.
- Robert Pollack. 1998. How to believe a machine-checked proof. In Twenty Five Years of Constructive Type Theory. Oxford University Press, 205–220.
- Carl Pollard and Ivan A Sag. 1987. Information-based syntax and semantics: Vol. 1: fundamentals. Center for the Study of Language and Information.
- Carl Pollard and Ivan A Sag. 1994. Head-driven phrase structure grammar. University of Chicago Press.
- NaturalJava: A Natural Language Interface for Programming in Java. In Proceedings of the 5th International Conference on Intelligent User Interfaces (New Orleans, Louisiana, USA) (IUI ’00). Association for Computing Machinery, New York, NY, USA, 207–211. https://doi.org/10.1145/325737.325845
- Aarne Ranta. 1991. Intuitionistic categorial grammar. Linguistics and Philosophy 14, 2 (1991), 203–239.
- Aarne Ranta. 1994. Type-theoretical Grammar. Oxford University Press, Inc., New York, NY, USA.
- Aarne Ranta. 1995. Context-relative syntactic categories and the formalization of mathematical text. In International Workshop on Types for Proofs and Programs. Springer, 231–248.
- Aarne Ranta. 2004. Grammatical framework. Journal of Functional Programming 14, 2 (2004), 145–189.
- Aarne Ranta. 2011a. Grammatical framework: Programming with multilingual grammars. Vol. 173. CSLI Publications, Center for the Study of Language and Information Stanford.
- Aarne Ranta. 2011b. Translating between language and logic: what is easy and what is difficult. In Automated Deduction–CADE-23: 23rd International Conference on Automated Deduction, Wrocław, Poland, July 31-August 5, 2011. Proceedings 23. Springer, 5–25.
- Christian Retoré. 2013. The montagovian generative lexicon: a type theoretical framework for natural language semantics. In 19th international conference on types for proofs and programs (TYPES 2013).
- Yves Schabes. 1990. Mathematical and computational aspects of lexicalized grammars. Ph. D. Dissertation. Copyright - Copyright UMI - Dissertations Publishing 1990; Last updated - 2015-08-28.
- Ben Schneiderman. 1985. The relationship between COBOL and computer science. Annals of the History of Computing 7, 4 (1985), 348–352.
- Rolf Schwitter. 2002. English as a formal specification language. In Proceedings. 13th International Workshop on Database and Expert Systems Applications. IEEE, 228–232.
- A method for translating natural language program specifications into algebraic specifications. Systems and computers in Japan 23, 11 (1992), 1–16.
- A processing system for programming specifications in a natural language. In [1988] Proceedings of the Twenty-First Annual Hawaii International Conference on System Sciences. Volume II: Software track, Vol. 2. IEEE, 754–763.
- Tabled typeclass resolution. arXiv preprint arXiv:2001.04301 (2020).
- Anders Søgaard. 2021. Explainable Natural Language Processing. Synthesis Lectures on Human Language Technologies 14, 3 (2021), 1–123.
- Matthieu Sozeau and Nicolas Oury. 2008. First-class type classes. In International Conference on Theorem Proving in Higher Order Logics. Springer, 278–293.
- Mark Steedman. 2001. The Syntactic Process. The MIT Press.
- Mark Steedman. 2012. Taking scope: The natural semantics of quantifiers. Mit Press.
- Göran Sundholm. 1986. Proof theory and meaning. In Handbook of philosophical logic. Springer, 471–506.
- A Szabolcsi. 1997. Ways of Scope Taking. Vol. 65. Springer Science & Business Media.
- Anna Szabolcsi. 2010. Quantification. Cambridge University Press.
- Attempto Controlled English Team. [n. d.]. ACE 6.7 Syntax Report. http://attempto.ifi.uzh.ch/site/docs/syntax_report.html
- AND does not mean OR: Using Formal Languages to Study Language Models’ Representations. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics.
- Language models are not naysayers: an analysis of language models on negation benchmarks. In Proceedings of the 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023). Association for Computational Linguistics, Toronto, Canada, 101–114. https://doi.org/10.18653/v1/2023.starsem-1.10
- Sunil Vadera and Farid Meziane. 1994. From English to formal specifications. Comput. J. 37, 9 (1994), 753–763.
- Johan van Benthem. 1990. Categorial Grammar and Type Theory. Journal of Philosophical Logic 19, 2 (1990), 115–168. http://www.jstor.org/stable/30226424
- Attention is all you need. Advances in neural information processing systems 30 (2017).
- Dennis M Volpano and Hubert E Dunsmore. 1984. Empirical investigation of COBOL features. Information Processing & Management 20, 1-2 (1984), 277–291.
- David H.D. Warren and Fernando C.N. Pereira. 1982. An Efficient Easily Adaptable System for Interpreting Natural Language Queries. American Journal of Computational Linguistics 8, 3-4 (1982), 110–122. https://aclanthology.org/J82-3002
- Markus Wenzel. 1999. Isar—a generic interpretative approach to readable formal proof documents. In International Conference on Theorem Proving in Higher Order Logics. Springer, 167–183.
- Michael White. 2006. CCG Chart Realization from Disjunctive Inputs. In Proceedings of the Fourth International Natural Language Generation Conference. Association for Computational Linguistics, Sydney, Australia, 12–19. https://aclanthology.org/W06-1403
- Michael White and Jason Baldridge. 2003. Adapting Chart Realization to CCG. In Proceedings of the 9th European Workshop on Natural Language Generation (ENLG-2003) at EACL 2003. Association for Computational Linguistics, Budapest, Hungary. https://aclanthology.org/W03-2316
- Towards broad coverage surface realization with CCG. In Proceedings of the Workshop on Using corpora for natural language generation. Copenhagen, Denmark. https://aclanthology.org/2007.mtsummit-ucnlg.4
- Jeannette M Wing. 1990. A specifier’s introduction to formal methods. Computer 23, 9 (1990), 8–22.
- Autoformalization with large language models. Advances in Neural Information Processing Systems 35 (2022), 32353–32368.
- Richie Yeung and Dimitri Kartsaklis. 2021. A CCG-Based Version of the DisCoCat Framework. In Proceedings of the 2021 Workshop on Semantic Spaces at the Intersection of NLP, Physics, and Cognitive Science (SemSpace). Association for Computational Linguistics, Groningen, The Netherlands, 20–31. https://aclanthology.org/2021.semspace-1.3
- Colin S. Gordon (17 papers)
- Sergey Matskevich (4 papers)