Papers
Topics
Authors
Recent
Search
2000 character limit reached

Lemur: Integrating Large Language Models in Automated Program Verification

Published 7 Oct 2023 in cs.FL, cs.AI, cs.LG, and cs.LO | (2310.04870v5)

Abstract: The demonstrated code-understanding capability of LLMs raises the question of whether they can be used for automated program verification, a task that demands high-level abstract reasoning about program properties that is challenging for verification tools. We propose a general methodology to combine the power of LLMs and automated reasoners for automated program verification. We formally describe this methodology as a set of transition rules and prove its soundness. We instantiate the calculus as a sound automated verification procedure and demonstrate practical improvements on a set of synthetic and competition benchmarks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)
  1. Dirk Beyer. Competition on software verification and witness validation: Sv-comp 2023. In Sriram Sankaranarayanan and Natasha Sharygina (eds.), Tools and Algorithms for the Construction and Analysis of Systems, pp.  495–522, Cham, 2023. Springer Nature Switzerland. ISBN 978-3-031-30820-8.
  2. Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  1877–1901. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
  3. A new era in software security: Towards self-healing software via large language models and formal verification, 2023.
  4. Evaluating large language models trained on code. CoRR, abs/2107.03374, 2021. URL https://arxiv.org/abs/2107.03374.
  5. Palm: Scaling language modeling with pathways, 2022.
  6. Handbook of model checking, volume 10. Springer, 2018.
  7. BERT: pre-training of deep bidirectional transformers for language understanding. In Jill Burstein, Christy Doran, and Thamar Solorio (eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp.  4171–4186. Association for Computational Linguistics, 2019. doi: 10.18653/v1/n19-1423. URL https://doi.org/10.18653/v1/n19-1423.
  8. Baldur: Whole-proof generation and repair with large language models. CoRR, abs/2303.04910, 2023. doi: 10.48550/arXiv.2303.04910. URL https://doi.org/10.48550/arXiv.2303.04910.
  9. Esbmc 5.0: an industrial-strength c model checker. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pp.  888–891, 2018.
  10. Inc. GitHub. GitHub Copilot. https://copilot.github.com/, 2021. Accessed: September 2023.
  11. Ultimate automizer with smtinterpol: (competition contribution). In International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pp.  641–643. Springer, 2013.
  12. Cbmc–c bounded model checker: (competition contribution). In Tools and Algorithms for the Construction and Analysis of Systems: 20th International Conference, TACAS 2014, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2014, Grenoble, France, April 5-13, 2014. Proceedings 20, pp.  389–391. Springer, 2014.
  13. Solving quantitative reasoning problems with language models. In NeurIPS, 2022. URL http://papers.nips.cc/paper_files/paper/2022/hash/18abbeef8cfe9203fdf9053c9c4fe191-Abstract-Conference.html.
  14. Competition-level code generation with AlphaCode. Science, 378(6624):1092–1097, dec 2022. doi: 10.1126/science.abq1158. URL https://doi.org/10.1126%2Fscience.abq1158.
  15. OpenAI. Gpt-4 technical report, 2023.
  16. Can large language models reason about program invariants? In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp.  27496–27520. PMLR, 23–29 Jul 2023. URL https://proceedings.mlr.press/v202/pei23a.html.
  17. Code2inv: A deep learning framework for program verification. In Shuvendu K. Lahiri and Chao Wang (eds.), Computer Aided Verification - 32nd International Conference, CAV 2020, Los Angeles, CA, USA, July 21-24, 2020, Proceedings, Part II, volume 12225 of Lecture Notes in Computer Science, pp.  151–164. Springer, 2020. doi: 10.1007/978-3-030-53291-8_9. URL https://doi.org/10.1007/978-3-030-53291-8_9.
  18. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
  19. Sorting and transforming program repair ingredients via deep learning code similarities. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, feb 2019. doi: 10.1109/saner.2019.8668043. URL https://doi.org/10.1109%2Fsaner.2019.8668043.
Citations (18)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 4 tweets with 4 likes about this paper.