Papers
Topics
Authors
Recent
Search
2000 character limit reached

Towards Neural Synthesis for SMT-Assisted Proof-Oriented Programming

Published 3 May 2024 in cs.PL, cs.AI, and cs.SE | (2405.01787v3)

Abstract: Proof-oriented programs mix computational content with proofs of program correctness. However, the human effort involved in programming and proving is still substantial, despite the use of Satisfiability Modulo Theories (SMT) solvers to automate proofs in languages such as F*. Seeking to spur research on using AI to automate the construction of proof-oriented programs, we curate a dataset of 600K lines of open-source F* programs and proofs, including software used in production systems ranging from Windows and Linux to Python and Firefox. Our dataset includes around 32K top-level F* definitions, each representing a type-directed program and proof synthesis problem producing a definition given a formal specification expressed as an F* type. We provide a program fragment checker that queries F* to check the correctness of candidate solutions. We also report on an extended version of our dataset containing a total of 940K lines of programs and proofs, with a total of 54k top-level F* definitions. We believe this is the largest corpus of SMT-assisted program proofs coupled with a reproducible program-fragment checker. Grounded in this dataset, we investigate the use of AI to synthesize programs and their proofs in F*, with promising results. Our main finding in that the performance of fine-tuned smaller LLMs (such as Phi-2 or StarCoder) compare favorably with LLMs (such as GPT-4), at a much lower computational cost. We also identify various type-based retrieval augmentation techniques and find that they boost performance significantly. With detailed error analysis and case studies, we identify potential strengths and weaknesses of models and techniques and suggest directions for future improvements.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (65)
  1. H. Pearce, B. Ahmad, B. Tan, B. Dolan-Gavitt, and R. Karri, “Asleep at the keyboard? assessing the security of github copilot’s code contributions,” in 2022 IEEE Symposium on Security and Privacy (SP), 2022, pp. 754–768.
  2. E. First, M. Rabe, T. Ringer, and Y. Brun, “Baldur: Whole-proof generation and repair with large language models,” in Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2023, pp. 1229–1241.
  3. G. Lample, T. Lacroix, M.-A. Lachaux, A. Rodriguez, A. Hayat, T. Lavril, G. Ebner, and X. Martinet, “Hypertree proof search for neural theorem proving,” Advances in Neural Information Processing Systems, vol. 35, pp. 26 337–26 349, 2022.
  4. A. Thakur, Y. Wen, and S. Chaudhuri, “A language-agent approach to formal theorem-proving,” arXiv preprint arXiv:2310.04353, 2023.
  5. A. Sanchez-Stern, Y. Alhessi, L. Saul, and S. Lerner, “Generating correctness proofs with neural networks,” in Proceedings of the 4th ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, 2020, pp. 1–10.
  6. K. Yang and J. Deng, “Learning to prove theorems via interacting with proof assistants,” in International Conference on Machine Learning.   PMLR, 2019, pp. 6984–6994.
  7. K. Yang, A. Swope, A. Gu, R. Chalamala, P. Song, S. Yu, S. Godil, R. J. Prenger, and A. Anandkumar, “Leandojo: Theorem proving with retrieval-augmented language models,” Advances in Neural Information Processing Systems, vol. 36, 2024.
  8. T. mathlib Community, “The lean mathematical library,” in Proceedings of the 9th ACM SIGPLAN International Conference on Certified Programs and Proofs, ser. CPP 2020.   New York, NY, USA: Association for Computing Machinery, 2020, p. 367–381. [Online]. Available: https://doi.org/10.1145/3372885.3373824
  9. N. Swamy, C. Hritcu, C. Keller, A. Rastogi, A. Delignat-Lavaud, S. Forest, K. Bhargavan, C. Fournet, P.-Y. Strub, M. Kohlweiss, J.-K. Zinzindohoué, and S. Zanella-Béguelin, “Dependent types and multi-monadic effects in F*,” in 43rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL).   ACM, Jan. 2016, pp. 256–270. [Online]. Available: https://www.fstar-lang.org/papers/mumon/
  10. K. R. M. Leino, “Dafny: An automatic program verifier for functional correctness,” in Logic for Programming, Artificial Intelligence, and Reasoning: 16th International Conference, LPAR-16, Dakar, Senegal, April 25–May 1, 2010, Revised Selected Papers 16.   Springer, 2010, pp. 348–370.
  11. P. Müller, M. Schwerhoff, and A. J. Summers, “Viper: A verification infrastructure for permission-based reasoning,” in Verification, Model Checking, and Abstract Interpretation (VMCAI), ser. LNCS, B. Jobstmann and K. R. M. Leino, Eds., vol. 9583.   Springer-Verlag, 2016, pp. 41–62. [Online]. Available: https://doi.org/10.1007/978-3-662-49122-5_2
  12. A. Lattuada, T. Hance, C. Cho, M. Brun, I. Subasinghe, Y. Zhou, J. Howell, B. Parno, and C. Hawblitzel, “Verus: Verifying rust programs using linear ghost types,” Proc. ACM Program. Lang., vol. 7, no. OOPSLA1, apr 2023. [Online]. Available: https://doi.org/10.1145/3586037
  13. A. Kamath, A. Senthilnathan, S. Chakraborty, P. Deligiannis, S. K. Lahiri, A. Lal, A. Rastogi, S. Roy, and R. Sharma, “Finding inductive loop invariants using large language models,” arXiv preprint arXiv:2311.07948, 2023.
  14. S. Chakraborty, S. K. Lahiri, S. Fakhoury, M. Musuvathi, A. Lal, A. Rastogi, A. Senthilnathan, R. Sharma, and N. Swamy, “Ranking llm-generated loop invariants for program verification,” arXiv preprint arXiv:2310.09342, 2023.
  15. K. Pei, D. Bieber, K. Shi, C. Sutton, and P. Yin, “Can large language models reason about program invariants?” 2023.
  16. C. Liu, X. Wu, Y. Feng, Q. Cao, and J. Yan, “Towards general loop invariant generation via coordinating symbolic execution and large language models,” arXiv preprint arXiv:2311.10483, 2023.
  17. C. Sun, Y. Sheng, O. Padon, and C. Barrett, “Clover: Closed-loop verifiable code generation,” arXiv preprint arXiv:2310.17807, 2023.
  18. M. Rakib Hossain Misu, C. V. Lopes, I. Ma, and J. Noble, “Towards ai-assisted synthesis of verified dafny methods,” arXiv e-prints, pp. arXiv–2402, 2024.
  19. R. OpenAI, “Gpt-4 technical report. arxiv 2303.08774,” View in Article, vol. 2, 2023.
  20. OpenAI, “Gpt-4 technical report,” 2023.
  21. S. Gunasekar, Y. Zhang, J. Aneja, C. C. T. Mendes, A. Del Giorno, S. Gopi, M. Javaheripi, P. Kauffmann, G. de Rosa, O. Saarikivi et al., “Textbooks are all you need,” arXiv preprint arXiv:2306.11644, 2023.
  22. A. Mitra, L. Del Corro, S. Mahajan, A. Codas, C. Simoes, S. Agarwal, X. Chen, A. Razdaibiedina, E. Jones, K. Aggarwal et al., “Orca 2: Teaching small language models how to reason,” arXiv preprint arXiv:2311.11045, 2023.
  23. R. Li, L. B. Allal, Y. Zi, N. Muennighoff, D. Kocetkov, C. Mou, M. Marone, C. Akiki, J. Li, J. Chim et al., “Starcoder: may the source be with you!” arXiv preprint arXiv:2305.06161, 2023.
  24. C. Hawblitzel, J. Howell, J. R. Lorch, A. Narayan, B. Parno, D. Zhang, and B. Zill, “Ironclad apps: End-to-End security via automated Full-System verification,” in 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14).   Broomfield, CO: USENIX Association, Oct. 2014, pp. 165–181. [Online]. Available: https://www.usenix.org/conference/osdi14/technical-sessions/presentation/hawblitzel
  25. J.-K. Zinzindohoué, K. Bhargavan, J. Protzenko, and B. Beurdouche, “HACL*: A verified modern cryptographic library,” in ACM Conference on Computer and Communications Security.   ACM, 2017, pp. 1789–1806. [Online]. Available: http://eprint.iacr.org/2017/536
  26. L. De Moura and N. Bjørner, “Z3: An efficient smt solver,” in Tools and Algorithms for the Construction and Analysis of Systems: 14th International Conference, TACAS 2008, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2008, Budapest, Hungary, March 29-April 6, 2008. Proceedings 14.   Springer, 2008, pp. 337–340.
  27. J. Protzenko, J.-K. Zinzindohoué, A. Rastogi, T. Ramananandro, P. Wang, S. Zanella-Béguelin, A. Delignat-Lavaud, C. Hritcu, K. Bhargavan, C. Fournet, and N. Swamy, “Verified low-level programming embedded in F*,” PACMPL, vol. 1, no. ICFP, pp. 17:1–17:29, Sep. 2017. [Online]. Available: http://arxiv.org/abs/1703.00053
  28. T. Ramananandro, A. Delignat-Lavaud, C. Fournet, N. Swamy, T. Chajed, N. Kobeissi, and J. Protzenko, “Everparse: Verified secure zero-copy parsers for authenticated message formats,” in Proceedings of the 28th USENIX Conference on Security Symposium, ser. SEC’19.   USA: USENIX Association, 2019, p. 1465–1482.
  29. N. Swamy, T. Ramananandro, A. Rastogi, I. Spiridonova, H. Ni, D. Malloy, J. Vazquez, M. Tang, O. Cardona, and A. Gupta, “Hardening attack surfaces with formally proven binary format parsers,” in Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI ’22), June 13–17, 2022, San Diego, CA, USA, 2022. [Online]. Available: https://www.fstar-lang.org/papers/EverParse3D.pdf
  30. A. Fromherz, N. Giannarakis, C. Hawblitzel, B. Parno, A. Rastogi, and N. Swamy, “A verified, efficient embedding of a verifiable assembly language,” PACMPL, no. POPL, 2019. [Online]. Available: https://github.com/project-everest/project-everest.github.io/raw/master/assets/vale-popl.pdf
  31. J. Protzenko, B. Parno, A. Fromherz, C. Hawblitzel, M. Polubelova, K. Bhargavan, B. Beurdouche, J. Choi, A. Delignat-Lavaud, C. Fournet, N. Kulatova, T. Ramananandro, A. Rastogi, N. Swamy, C. M. Wintersteiger, and S. Zanella-Beguelin, “Evercrypt: A fast, verified, cross-platform cryptographic provider,” in 2020 IEEE Symposium on Security and Privacy (SP), 2020, pp. 983–1002.
  32. K. Bhargavan, A. Delignat-Lavaud, C. Fournet, M. Kohlweiss, J. Pan, J. Protzenko, A. Rastogi, N. Swamy, S. Zanella Béguelin, and J. K. Zinzindohoue, “Implementing and proving the TLS 1.3 record layer,” IEEE Security & Privacy, 2017.
  33. A. Delignat-Lavaud, C. Fournet, B. Parno, J. Protzenko, T. Ramananandro, J. Bosamiya, J. Lallemand, I. Rakotonirina, and Y. Zhou, “A security model and fully verified implementation for the ietf quic record layer,” in 2021 IEEE Symposium on Security and Privacy (SP), 2021, pp. 1162–1178.
  34. A. Fromherz, A. Rastogi, N. Swamy, S. Gibson, G. Martínez, D. Merigoux, and T. Ramananandro, “Steel: Proof-oriented programming in a dependently typed concurrent separation logic,” in 25th ACM SIGPLAN International Conference on Functional Programming (ICFP), Aug. 2021. [Online]. Available: https://www.fstar-lang.org/papers/steel/
  35. P.-M. Osera and S. Zdancewic, “Type-and-example-directed program synthesis,” SIGPLAN Not., vol. 50, no. 6, p. 619–630, jun 2015. [Online]. Available: https://doi.org/10.1145/2813885.2738007
  36. Z. Jiang, F. F. Xu, L. Gao, Z. Sun, Q. Liu, J. Dwivedi-Yu, Y. Yang, J. Callan, and G. Neubig, “Active retrieval augmented generation,” arXiv preprint arXiv:2305.06983, 2023.
  37. P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel et al., “Retrieval-augmented generation for knowledge-intensive nlp tasks,” Advances in Neural Information Processing Systems, vol. 33, pp. 9459–9474, 2020.
  38. M. R. Parvez, W. U. Ahmad, S. Chakraborty, B. Ray, and K.-W. Chang, “Retrieval augmented code generation and summarization,” arXiv preprint arXiv:2108.11601, 2021.
  39. A. Neelakantan, T. Xu, R. Puri, A. Radford, J. M. Han, J. Tworek, Q. Yuan, N. Tezak, J. W. Kim, C. Hallacy, J. Heidecke, P. Shyam, B. Power, T. E. Nekoul, G. Sastry, G. Krueger, D. Schnurr, F. P. Such, K. Hsu, M. Thompson, T. Khan, T. Sherbakov, J. Jang, P. Welinder, and L. Weng, “Text and code embeddings by contrastive pre-training,” CoRR, vol. abs/2201.10005, 2022. [Online]. Available: https://arxiv.org/abs/2201.10005
  40. N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using siamese bert-networks,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing.   Association for Computational Linguistics, 11 2019. [Online]. Available: https://arxiv.org/abs/1908.10084
  41. E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” arXiv preprint arXiv:2106.09685, 2021.
  42. Y. Mao, L. Mathias, R. Hou, A. Almahairi, H. Ma, J. Han, W.-t. Yih, and M. Khabsa, “Unipelt: A unified framework for parameter-efficient language model tuning,” arXiv preprint arXiv:2110.07577, 2021.
  43. T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz et al., “Huggingface’s transformers: State-of-the-art natural language processing,” arXiv preprint arXiv:1910.03771, 2019.
  44. L. A. Agrawal, A. Kanade, N. Goyal, S. Lahiri, and S. Rajamani, “Monitor-guided decoding of code lms with static analysis of repository context,” Advances in Neural Information Processing Systems, vol. 36, 2024.
  45. Y. Wei, C. S. Xia, and L. Zhang, “Copiloting the copilots: Fusing large language models with completion engines for automated program repair,” in Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2023, pp. 172–184.
  46. C. Le Goues, T. Nguyen, S. Forrest, and W. Weimer, “Genprog: A generic method for automatic software repair,” Ieee transactions on software engineering, vol. 38, no. 1, pp. 54–72, 2011.
  47. S. Biderman, H. Schoelkopf, Q. G. Anthony, H. Bradley, K. O’Brien, E. Hallahan, M. A. Khan, S. Purohit, U. S. Prashanth, E. Raff et al., “Pythia: A suite for analyzing large language models across training and scaling,” in International Conference on Machine Learning.   PMLR, 2023, pp. 2397–2430.
  48. G. Klein, K. Elphinstone, G. Heiser, J. Andronick, D. Cock, P. Derrin, D. Elkaduwe, K. Engelhardt, R. Kolanski, M. Norrish, T. Sewell, H. Tuch, and S. Winwood, “sel4: formal verification of an os kernel,” in Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, ser. SOSP ’09.   New York, NY, USA: Association for Computing Machinery, 2009, p. 207–220. [Online]. Available: https://doi.org/10.1145/1629575.1629596
  49. J. C. Blanchette, S. Böhme, and L. C. Paulson, “Extending sledgehammer with SMT solvers,” J. Autom. Reason., vol. 51, no. 1, pp. 109–128, 2013. [Online]. Available: https://doi.org/10.1007/s10817-013-9278-5
  50. D. Kühlwein, J. C. Blanchette, C. Kaliszyk, and J. Urban, “Mash: Machine learning for sledgehammer,” in Interactive Theorem Proving - 4th International Conference, ITP 2013, Rennes, France, July 22-26, 2013. Proceedings, ser. Lecture Notes in Computer Science, S. Blazy, C. Paulin-Mohring, and D. Pichardie, Eds., vol. 7998.   Springer, 2013, pp. 35–50. [Online]. Available: https://doi.org/10.1007/978-3-642-39634-2_6
  51. M. Mikula, S. Antoniak, S. Tworkowski, A. Q. Jiang, J. P. Zhou, C. Szegedy, L. Kucinski, P. Milos, and Y. Wu, “Magnushammer: A transformer-based approach to premise selection,” CoRR, vol. abs/2303.04488, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2303.04488
  52. T. Gauthier, C. Kaliszyk, J. Urban, R. Kumar, and M. Norrish, “Tactictoe: Learning to prove with tactics,” J. Autom. Reason., vol. 65, no. 2, pp. 257–286, 2021. [Online]. Available: https://doi.org/10.1007/s10817-020-09580-x
  53. E. First, Y. Brun, and A. Guha, “Tactok: Semantics-aware proof synthesis,” Proceedings of the ACM on Programming Languages, vol. 4, no. OOPSLA, pp. 1–31, 2020.
  54. J. M. Han, J. Rute, Y. Wu, E. W. Ayers, and S. Polu, “Proof artifact co-training for theorem proving with language models,” CoRR, vol. abs/2102.06203, 2021. [Online]. Available: https://arxiv.org/abs/2102.06203
  55. H. Xin, H. Wang, C. Zheng, L. Li, Z. Liu, Q. Cao, Y. Huang, J. Xiong, H. Shi, E. Xie et al., “Lego-prover: Neural theorem proving with growing libraries,” arXiv preprint arXiv:2310.00656, 2023.
  56. A. Q. Jiang, S. Welleck, J. P. Zhou, T. Lacroix, J. Liu, W. Li, M. Jamnik, G. Lample, and Y. Wu, “Draft, sketch, and prove: Guiding formal theorem provers with informal proofs,” in The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023.   OpenReview.net, 2023. [Online]. Available: https://openreview.net/pdf?id=SMa9EAovKMC
  57. S. Welleck, J. Liu, X. Lu, H. Hajishirzi, and Y. Choi, “Naturalprover: Grounded mathematical proof generation with language models,” Advances in Neural Information Processing Systems, vol. 35, pp. 4913–4927, 2022.
  58. Y. Huang, X. Lin, Z. Liu, Q. Cao, H. Xin, H. Wang, Z. Li, L. Song, and X. Liang, “Mustard: Mastering uniform synthesis of theorem and proof data,” arXiv preprint arXiv:2402.08957, 2024.
  59. J. Yao, Z. Zhou, W. Chen, and W. Cui, “Leveraging large language models for automated proof synthesis in rust,” arXiv preprint arXiv:2311.03739, 2023.
  60. W. U. Ahmad, S. Chakraborty, B. Ray, and K.-W. Chang, “Unified pre-training for program understanding and generation,” arXiv preprint arXiv:2103.06333, 2021.
  61. S. Chakraborty, T. Ahmed, Y. Ding, P. T. Devanbu, and B. Ray, “Natgen: generative pre-training by “naturalizing” source code,” in Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2022, pp. 18–30.
  62. S. K. Lahiri, A. Naik, G. Sakkas, P. Choudhury, C. von Veh, M. Musuvathi, J. P. Inala, C. Wang, and J. Gao, “Interactive code generation via test-driven user-intent formalization,” CoRR, vol. abs/2208.05950, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2208.05950
  63. M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. d. O. Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman et al., “Evaluating large language models trained on code,” arXiv preprint arXiv:2107.03374, 2021.
  64. M. Endres, S. Fakhoury, S. Chakraborty, and S. K. Lahiri, “Formalizing natural language intent into program specifications via large language models,” CoRR, vol. abs/2310.01831, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2310.01831
  65. D. Key, W.-D. Li, and K. Ellis, “I speak, you verify: Toward trustworthy neural program synthesis,” 2022.
Citations (8)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 3 tweets with 4 likes about this paper.