Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

E&V: Prompting Large Language Models to Perform Static Analysis by Pseudo-code Execution and Verification (2312.08477v1)

Published 13 Dec 2023 in cs.SE

Abstract: Static analysis, the process of examining code without executing it, is crucial for identifying software issues. Yet, static analysis is hampered by its complexity and the need for customization for different targets. Traditional static analysis tools require extensive human effort and are often limited to specific target programs and programming languages. Recent advancements in LLMs, such as GPT-4 and Llama, offer new capabilities for software engineering tasks. However, their application in static analysis, especially in understanding complex code structures, remains under-explored. This paper introduces a novel approach named E&V , which leverages LLMs to perform static analysis. Specifically, E&V employs LLMs to simulate the execution of pseudo-code, effectively conducting static analysis encoded in the pseudo-code with minimal human effort, thereby improving the accuracy of results. E&V includes a verification process for pseudo-code execution without needing an external oracle. This process allows E&V to mitigate hallucinations of LLMs and enhance the accuracy of static analysis results. We have implemented E&V in a prototype tool designed for triaging crashes through backward taint analysis. This prototype, paired with GPT-4-32k, has been applied to triage 170 recently fixed Linux kernel bugs across seven bug categories. Our experiments demonstrate that the prototype correctly identifies the blamed function in 81.2% of the cases. Additionally, we observe that our novel verification process significantly improves the accuracy, increasing it from 28.2% to 81.2%.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Improving Few-Shot Prompts with Relevant Static Analysis Products. arXiv:2304.06815 [cs.SE]
  2. AntonOsika. 2023. GPT Engineer. https://github.com/AntonOsika/gpt-engineer
  3. Sparks of Artificial General Intelligence: Early experiments with GPT-4. CoRR abs/2303.12712 (2023). https://doi.org/10.48550/arXiv.2303.12712 arXiv:2303.12712
  4. Ranking LLM-Generated Loop Invariants for Program Verification. arXiv:2310.09342 [cs.PL]
  5. Evaluating Large Language Models Trained on Code. CoRR abs/2107.03374 (2021). arXiv:2107.03374 https://arxiv.org/abs/2107.03374
  6. FacTool: Factuality Detection in Generative AI - A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios. CoRR abs/2307.13528 (2023). https://doi.org/10.48550/arXiv.2307.13528 arXiv:2307.13528
  7. RETracer: triaging crashes by reverse execution from partial memory dumps. In Proceedings of the 38th International Conference on Software Engineering, ICSE 2016, Austin, TX, USA, May 14-22, 2016, Laura K. Dillon, Willem Visser, and Laurie A. Williams (Eds.). ACM, 820–831. https://doi.org/10.1145/2884781.2884844
  8. Large Language Models for Software Engineering: Survey and Open Problems. arXiv:2310.03533 [cs.SE]
  9. ChatGPT for Vulnerability Detection, Classification, and Repair: How Far Are We? arXiv:2310.09810 [cs.SE]
  10. Google. 2023. Syzbot. https://syzkaller.appspot.com/upstream
  11. CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing. CoRR abs/2305.11738 (2023). https://doi.org/10.48550/arXiv.2305.11738 arXiv:2305.11738
  12. Large Language Models for Software Engineering: A Systematic Literature Review. arXiv:2308.10620 [cs.SE]
  13. Large Language Models for Software Engineering: A Systematic Literature Review. CoRR abs/2308.10620 (2023). https://doi.org/10.48550/ARXIV.2308.10620 arXiv:2308.10620
  14. Large Language Models Cannot Self-Correct Reasoning Yet. CoRR abs/2310.01798 (2023). https://doi.org/10.48550/arXiv.2310.01798 arXiv:2310.01798
  15. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. arXiv:2311.05232 [cs.CL]
  16. Andrej Karpathy. 2023. Intro to Large Language Models. https://www.youtube.com/watch?v=zjkBMFhNj_g
  17. Linux Kernel. 2023. The Kernel Address Sanitizer (KASAN). https://www.kernel.org/doc/html/latest/dev-tools/kasan.html
  18. Chris Lattner and Vikram S. Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In 2nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2004), 20-24 March 2004, San Jose, CA, USA. IEEE Computer Society, 75–88. https://doi.org/10.1109/CGO.2004.1281665
  19. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv:2005.11401 [cs.CL]
  20. The Hitchhiker's Guide to Program Analysis: A Journey with Large Language Models. arXiv:2308.00245 [cs.SE]
  21. Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation. arXiv:2305.01210 [cs.SE]
  22. The Scope of ChatGPT in Software Engineering: A Thorough Investigation. CoRR abs/2305.12138 (2023). https://doi.org/10.48550/ARXIV.2305.12138 arXiv:2305.12138
  23. SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models. arXiv:2303.08896 [cs.CL]
  24. Augmented Language Models: a Survey. arXiv:2302.07842 [cs.CL]
  25. Microsoft. 2023. Azure. https://azure.microsoft.com
  26. Prompting with Pseudo-Code Instructions. CoRR abs/2305.11790 (2023). https://doi.org/10.48550/arXiv.2305.11790 arXiv:2305.11790
  27. Aleksandr Nogikh. 2023. Syzbot: 7 years of continuous kernel fuzzing. https://lpc.events/event/17/contributions/1521/
  28. OpenAI. 2023. GPT-4 Technical Report. CoRR abs/2303.08774 (2023). https://doi.org/10.48550/arXiv.2303.08774 arXiv:2303.08774
  29. Archit Parnami and Minwoo Lee. 2022. Learning from Few Examples: A Summary of Approaches to Few-Shot Learning. arXiv:2203.04291 [cs.LG]
  30. Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation. arXiv:2307.11019 [cs.CL]
  31. An Empirical Study of Using Large Language Models for Unit Test Generation. arXiv:2305.00418 [cs.SE]
  32. Significant-Gravitas. 2023. AutoGPT: the heart of the open-source agent ecosystem. https://github.com/Significant-Gravitas/AutoGPT
  33. GPT-4 Doesn't Know It's Wrong: An Analysis of Iterative Prompting for Reasoning Problems. arXiv:2310.12397 [cs.AI]
  34. Automatic Code Summarization via ChatGPT: How Far Are We? arXiv:2305.12865 [cs.SE]
  35. Llama 2: Open Foundation and Fine-Tuned Chat Models. CoRR abs/2307.09288 (2023). https://doi.org/10.48550/ARXIV.2307.09288 arXiv:2307.09288
  36. Large Language Models Still Can't Plan (A Benchmark for LLMs on Planning and Reasoning about Change). CoRR abs/2206.10498 (2022). https://doi.org/10.48550/arXiv.2206.10498 arXiv:2206.10498
  37. Can Large Language Models Really Improve by Self-critiquing Their Own Plans? arXiv:2310.08118 [cs.AI]
  38. Software Testing with Large Language Model: Survey, Landscape, and Vision. arXiv:2307.07221 [cs.SE]
  39. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv:2201.11903 [cs.CL]
  40. Lilian Weng. 2023. LLM powered Autonomous Agents. lilianweng.github.io (Jun 2023). https://lilianweng.github.io/posts/2023-06-23-agent/
  41. Large Language Models are Better Reasoners with Self-Verification. arXiv:2212.09561 [cs.AI]
  42. The Rise and Potential of Large Language Model Based Agents: A Survey. arXiv:2309.07864 [cs.AI]
  43. Chunqiu Steven Xia and Lingming Zhang. 2023. Keep the Conversation Going: Fixing 162 out of 337 bugs for 0.42 each using ChatGPT. arXiv:2304.00385 [cs.SE]
  44. Large Language Models as Optimizers. arXiv:2309.03409 [cs.LG]
  45. Tree of Thoughts: Deliberate Problem Solving with Large Language Models. arXiv:2305.10601 [cs.CL]
  46. Decoding Methods in Neural Language Generation: A Survey. Inf. 12, 9 (2021), 355. https://doi.org/10.3390/INFO12090355
  47. Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models. CoRR abs/2309.01219 (2023). https://doi.org/10.48550/arXiv.2309.01219 arXiv:2309.01219
  48. Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models. arXiv:2309.01219 [cs.CL]
  49. Large Language Models Are Human-Level Prompt Engineers. arXiv:2211.01910 [cs.LG]
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yu Hao (32 papers)
  2. Weiteng Chen (3 papers)
  3. Ziqiao Zhou (4 papers)
  4. Weidong Cui (4 papers)
Citations (6)

Summary

We haven't generated a summary for this paper yet.