Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
120 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Enchanting Program Specification Synthesis by Large Language Models using Static Analysis and Program Verification (2404.00762v2)

Published 31 Mar 2024 in cs.SE

Abstract: Formal verification provides a rigorous and systematic approach to ensure the correctness and reliability of software systems. Yet, constructing specifications for the full proof relies on domain expertise and non-trivial manpower. In view of such needs, an automated approach for specification synthesis is desired. While existing automated approaches are limited in their versatility, i.e., they either focus only on synthesizing loop invariants for numerical programs, or are tailored for specific types of programs or invariants. Programs involving multiple complicated data types (e.g., arrays, pointers) and code structures (e.g., nested loops, function calls) are often beyond their capabilities. To help bridge this gap, we present AutoSpec, an automated approach to synthesize specifications for automated program verification. It overcomes the shortcomings of existing work in specification versatility, synthesizing satisfiable and adequate specifications for full proof. It is driven by static analysis and program verification, and is empowered by LLMs. AutoSpec addresses the practical challenges in three ways: (1) driving \name by static analysis and program verification, LLMs serve as generators to generate candidate specifications, (2) programs are decomposed to direct the attention of LLMs, and (3) candidate specifications are validated in each round to avoid error accumulation during the interaction with LLMs. In this way, AutoSpec can incrementally and iteratively generate satisfiable and adequate specifications. The evaluation shows its effectiveness and usefulness, as it outperforms existing works by successfully verifying 79% of programs through automatic specification synthesis, a significant improvement of 1.592x. It can also be successfully applied to verify the programs in a real-world X509-parser project.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (69)
  1. Deductive software verification: from pen-and-paper proofs to industrial tools. Computing and Software Science: State of the Art and Perspectives, pages 345–373, 2019.
  2. Learning loop invariants for program verification. Advances in Neural Information Processing Systems, 31, 2018.
  3. Journey to a rte-free x. 509 parser. In Symposium sur la sécurité des technologies de l’information et des communications (SSTIC 2019), 2019.
  4. Deductive verification of unmodified linux kernel library functions. In Leveraging Applications of Formal Methods, Verification and Validation. Verification: 8th International Symposium, ISoLA 2018, Limassol, Cyprus, November 5-9, 2018, Proceedings, Part II 8, pages 216–234. Springer, 2018.
  5. Frank Dordowsky. An experimental study using acsl and frama-c to formulate and verify low-level requirements from a do-178c compliant avionics project. arXiv preprint arXiv:1508.03894, 2015.
  6. A case study on formal verification of the anaxagoros hypervisor paging system with frama-c. In International Workshop on Formal Methods for Industrial Critical Systems, pages 15–30. Springer, 2015.
  7. A case study on verification of a cloud hypervisor by proof and structural testing. In Tests and Proofs: 8th International Conference, TAP 2014, Held as Part of STAF 2014, York, UK, July 24-25, 2014. Proceedings 8, pages 158–164. Springer, 2014.
  8. Inductive invariant generation via abductive inference. Acm Sigplan Notices, 48(10):443–456, 2013.
  9. Inferring loop invariants for multi-path loops. In 2021 International Symposium on Theoretical Aspects of Software Engineering (TASE), pages 63–70. IEEE, 2021.
  10. Loop invariant inference through smt solving enhanced reinforcement learning. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, pages 175–187, 2023.
  11. Automatic inference of necessary preconditions. In International Workshop on Verification, Model Checking, and Abstract Interpretation, pages 128–148. Springer, 2013.
  12. Data-driven precondition inference with learned features. ACM SIGPLAN Notices, 51(6):42–56, 2016.
  13. Inferring disjunctive postconditions. In Annual Asian Computing Science Conference, pages 331–345. Springer, 2006.
  14. Using consensus to automatically infer post-conditions. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings, pages 202–203, 2018.
  15. An algorithm and tool to infer practical postconditions. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings, pages 313–314, 2018.
  16. Cln2inv: Learning loop invariants with continuous logic network. In International Conference on Learning Representations, 2020.
  17. Learning nonlinear loop invariants with gated continuous logic networks. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 106–120, 2020.
  18. Invgen: An efficient invariant generator. In Computer Aided Verification: 21st International Conference, CAV 2009, Grenoble, France, June 26-July 2, 2009. Proceedings 21, pages 634–640. Springer, 2009.
  19. Shape analysis via second-order bi-abduction. In Computer Aided Verification: 26th International Conference, CAV 2014, Held as Part of the Vienna Summer of Logic, VSL 2014, Vienna, Austria, July 18-22, 2014. Proceedings 26, pages 52–68. Springer, 2014.
  20. Synthesizing invariant barrier certificates via difference-of-convex programming. In International Conference on Computer Aided Verification, pages 443–466. Springer, 2021.
  21. Finding polynomial loop invariants for probabilistic programs. In Automated Technology for Verification and Analysis: 15th International Symposium, ATVA 2017, Pune, India, October 3–6, 2017, Proceedings 15, pages 400–416. Springer, 2017.
  22. Nonlinear craig interpolant generation. In International Conference on Computer Aided Verification, pages 415–438. Springer, 2020.
  23. Termination and non-termination specification inference. In The 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 489–498, 2015.
  24. Maximum causal entropy specification inference from demonstrations. In Computer Aided Verification: 32nd International Conference, CAV 2020, Los Angeles, CA, USA, July 21–24, 2020, Proceedings, Part II 32, pages 255–278. Springer, 2020.
  25. OpenAI. GPT-4 technical report. CoRR, abs/2303.08774, 2023.
  26. Lost in the middle: How language models use long contexts. arXiv preprint arXiv:2307.03172, 2023.
  27. Acsl: Ansi/iso c specification. 2021.
  28. Frama-C. Frama-c, software analzer, Accessed: 2024-01-15.
  29. Polynomial invariants by linear algebra. In Automated Technology for Verification and Analysis: 14th International Symposium, ATVA 2016, Chiba, Japan, October 17-20, 2016, Proceedings 14, pages 479–494. Springer, 2016.
  30. Language models are few-shot learners. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
  31. Frama-C. A repository dedicated for problems related to verification of programs using the tool frama-c. Accessed: 2024-01-15.
  32. a rte-free x.509 parser, Accessed: 2024-01-15.
  33. Sygus-comp 2018: Results and analysis. CoRR, abs/1904.07146, 2019.
  34. Inductive invariant generation via abductive inference. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA ’13, page 443–456, New York, NY, USA, 2013. Association for Computing Machinery.
  35. Dirk Beyer. Progress on software verification: Sv-comp 2022. In Dana Fisman and Grigore Rosu, editors, Tools and Algorithms for the Construction and Analysis of Systems, pages 375–402, Cham, 2022. Springer International Publishing.
  36. Wp plug-in manual. Frama-c. com, 2020.
  37. Towards full proof automation in frama-c using auto-active verification. In NASA Formal Methods Symposium, pages 88–105. Springer, 2019.
  38. Frama-c: A software analysis perspective. Formal aspects of computing, 27:573–609, 2015.
  39. How effective are neural networks for fixing security vulnerabilities. In René Just and Gordon Fraser, editors, Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2023, Seattle, WA, USA, July 17-21, 2023, pages 1282–1294. ACM, 2023.
  40. nl2spec: Interactively translating unstructured natural language to temporal logics with large language models. In Constantin Enea and Akash Lal, editors, Computer Aided Verification, pages 383–396, Cham, 2023. Springer Nature Switzerland.
  41. C2s: translating natural language comments to formal program specifications. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 25–37, 2020.
  42. Generation of formal requirements from structured natural language. In Requirements Engineering: Foundation for Software Quality: 26th International Working Conference, REFSQ 2020, Pisa, Italy, March 24–27, 2020, Proceedings 26, pages 19–35. Springer, 2020.
  43. Probabilistic, modular and scalable inference of typestate specifications. In Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation, pages 211–221, 2011.
  44. Efficient mining of iterative patterns for software specification discovery. In The 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 460–469, 2007.
  45. Deep specification mining. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, pages 106–117, 2018.
  46. Adversarial specification mining. ACM Transactions on Software Engineering and Methodology (TOSEM), 30(2):1–40, 2021.
  47. Mining specifications. ACM Sigplan Notices, 37(1):4–16, 2002.
  48. Perracotta: mining temporal api rules from imperfect traces. In Proceedings of the 28th international conference on Software engineering, pages 282–291, 2006.
  49. Jeremy William Nimmer. Automatic generation and checking of program specifications. PhD thesis, Massachusetts Institute of Technology, 2002.
  50. Static specification inference using predicate mining. ACM SIGPLAN Notices, 42(6):123–134, 2007.
  51. Static specification mining using automata-based abstractions. In Proceedings of the 2007 International Symposium on Software Testing and Analysis, pages 174–184, 2007.
  52. Fib: Squeezing loop invariants by interpolation between forward/backward predicate transformers. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 793–803, 2017.
  53. Fuzzing class specifications. In Proceedings of the 44th International Conference on Software Engineering, pages 1008–1020, 2022.
  54. Large language models for software engineering: A systematic literature review. CoRR, abs/2308.10620, 2023.
  55. Large language models are zero-shot fuzzers: Fuzzing deep-learning libraries via large language models. In René Just and Gordon Fraser, editors, Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2023, Seattle, WA, USA, 2023, pages 423–435. ACM, 2023.
  56. Codamosa: Escaping coverage plateaus in test generation with pre-trained large language models. In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, 2023, pages 919–931. IEEE, 2023.
  57. Automatically inspecting thousands of static bug warnings with large language model: How far are we? ACM Transactions on Knowledge Discovery from Data, 2024.
  58. Automatic code summarization via chatgpt: How far are we? CoRR, abs/2305.12865, 2023.
  59. Poster: Assisting static analysis with large language models: A chatgpt experiment. In 44th IEEE Symposium on Security and Privacy, SP 2023, San Francisco, CA, USA, 2023. IEEE, 2023.
  60. Baldur: whole-proof generation and repair with large language models. In ESEC/FSE ’23: 31th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 2023.
  61. Lemur: Integrating large language models in automated program verification. arXiv preprint arXiv:2310.04870, 2023.
  62. Leandojo: Theorem proving with retrieval-augmented language models. arXiv preprint arXiv:2306.15626, 2023.
  63. Large language models are few-shot testers: Exploring llm-based general bug reproduction. In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023, pages 2312–2323. IEEE, 2023.
  64. Examining zero-shot vulnerability repair with large language models. In 44th IEEE Symposium on Security and Privacy, SP 2023, San Francisco, CA, USA, May 21-25, 2023, pages 2339–2356. IEEE, 2023.
  65. Automated program repair in the era of large pre-trained language models. In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023, pages 1482–1494. IEEE, 2023.
  66. Automated repair of programs from large language models. In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023, pages 1469–1481. IEEE, 2023.
  67. Ubitect: a precise and scalable method to detect use-before-initialization bugs in linux kernel. In Prem Devanbu, Myra B. Cohen, and Thomas Zimmermann, editors, ESEC/FSE ’20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event, USA, November 8-13, 2020, pages 221–232. ACM, 2020.
  68. The scope of chatgpt in software engineering: A thorough investigation. CoRR, abs/2305.12138, 2023.
  69. Can large language models reason about program invariants? 2023.
Citations (11)

Summary

We haven't generated a summary for this paper yet.