Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LMs: Understanding Code Syntax and Semantics for Code Analysis (2305.12138v4)

Published 20 May 2023 in cs.SE and cs.AI

Abstract: LLMs~(LLMs) demonstrate significant potential to revolutionize software engineering (SE) by exhibiting outstanding performance in SE tasks such as code and document generation. However, the high reliability and risk control requirements in software engineering raise concerns about the lack of interpretability of LLMs. To address this concern, we conducted a study to evaluate the capabilities of LLMs and their limitations for code analysis in SE. We break down the abilities needed for artificial intelligence~(AI) models to address SE tasks related to code analysis into three categories: 1) syntax understanding, 2) static behavior understanding, and 3) dynamic behavior understanding. Our investigation focused on the ability of LLMs to comprehend code syntax and semantic structures, which include abstract syntax trees (AST), control flow graphs (CFG), and call graphs (CG). We employed four state-of-the-art foundational models, GPT4, GPT3.5, StarCoder and CodeLlama-13b-instruct. We assessed the performance of LLMs on cross-language tasks involving C, Java, Python, and Solidity. Our findings revealed that while LLMs have a talent for understanding code syntax, they struggle with comprehending code semantics, particularly dynamic semantics. We conclude that LLMs possess capabilities similar to an Abstract Syntax Tree (AST) parser, demonstrating initial competencies in static code analysis. Furthermore, our study highlights that LLMs are susceptible to hallucinations when interpreting code semantic structures and fabricating nonexistent facts. These results indicate the need to explore methods to verify the correctness of LLM output to ensure its dependability in SE. More importantly, our study provides an initial answer to why the codes generated by LLM are usually syntax-correct but vulnerable.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. 2022-11. Chatgpt: Optimizing language models for dialogue. https://chat.openai.com
  2. 2023. awesome-chatgpt-prompts. https://github.com/f/awesome-chatgpt-prompts.
  3. 2023. Capabilities of ChatGPT for Code Analysis: An Empirical Study. https://sites.google.com/view/chatgpt4se
  4. Unified pre-training for program understanding and generation. arXiv preprint arXiv:2103.06333 (2021).
  5. Predicting Flaky Tests Categories using Few-Shot Learning. arXiv preprint arXiv:2208.14799 (2022).
  6. Learning to represent programs with graphs. arXiv preprint arXiv:1711.00740 (2017).
  7. Frances E. Allen. 1970. Control Flow Analysis. In Proceedings of a Symposium on Compiler Optimization (Urbana-Champaign, Illinois). Association for Computing Machinery, New York, NY, USA, 1–19. https://doi.org/10.1145/800028.808479
  8. Clone detection using abstract syntax trees. In Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272). 368–377. https://doi.org/10.1109/ICSM.1998.738528
  9. Clone detection using abstract syntax trees. In Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272). IEEE, 368–377.
  10. Static detection of control-flow-related vulnerabilities using graph embedding. In 2019 24th International Conference on Engineering of Complex Computer Systems (ICECCS). IEEE, 41–50.
  11. Deep reinforcement learning from human preferences. Advances in neural information processing systems 30 (2017).
  12. What does bert look at? an analysis of bert’s attention. arXiv preprint arXiv:1906.04341 (2019).
  13. Frama-C. In Software Engineering and Formal Methods, George Eleftherakis, Mike Hinchey, and Mike Holcombe (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 233–247.
  14. Github copilot ai pair programmer: Asset or liability? Journal of Systems and Software 203 (2023), 111734.
  15. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
  16. Slither: A Static Analysis Framework for Smart Contracts. In Proceedings of the 2nd International Workshop on Emerging Trends in Software Engineering for Blockchain (Montreal, Quebec, Canada) (WETSEB ’19). IEEE Press, 8–15. https://doi.org/10.1109/WETSEB.2019.00008
  17. Codebert: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155 (2020).
  18. The program dependence graph and its use in optimization. In International Symposium on Programming. Springer, 125–132.
  19. The Program Dependence Graph and Its Use in Optimization. ACM Trans. Program. Lang. Syst. 9, 3 (jul 1987), 319–349. https://doi.org/10.1145/24039.24041
  20. Graphcodebert: Pre-training code representations with data flow. arXiv preprint arXiv:2009.08366 (2020).
  21. John Hewitt and Christopher D. Manning. 2019. A Structural Probe for Finding Syntax in Word Representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4129–4138. https://doi.org/10.18653/v1/N19-1419
  22. Michael Hind and Anthony Pioli. 2000. Which Pointer Analysis Should I Use? SIGSOFT Softw. Eng. Notes 25, 5 (aug 2000), 113–123. https://doi.org/10.1145/347636.348916
  23. TreeBERT: A tree-based pre-trained model for programming language. In Uncertainty in Artificial Intelligence. PMLR, 54–63.
  24. SymLM: Predicting Function Names in Stripped Binaries via Context-Sensitive Execution-Aware Code Embeddings. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security (Los Angeles, CA, USA) (CCS ’22). Association for Computing Machinery, New York, NY, USA, 1631–1645. https://doi.org/10.1145/3548606.3560612
  25. Ken Kennedy. 1979. A survey of data flow analysis techniques. IBM Thomas J. Watson Research Division.
  26. Survey of dynamic taint analysis. In 2014 4th IEEE International Conference on Network Infrastructure and Digital Content. IEEE, 269–272.
  27. Clone detection using abstract syntax suffix trees. In 2006 13th Working Conference on Reverse Engineering. IEEE, 253–262.
  28. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. Comput. Surveys 55, 9 (2023), 1–35.
  29. Pre-Train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. ACM Comput. Surv. 55, 9, Article 195 (jan 2023), 35 pages. https://doi.org/10.1145/3560815
  30. Retrieval-augmented generation for code summarization via hybrid gnn. arXiv preprint arXiv:2006.05405 (2020).
  31. ContraBERT: Enhancing Code Pre-trained Models via Contrastive Learning. arXiv preprint arXiv:2301.09072 (2023).
  32. GraphSearchNet: Enhancing GNNs via Capturing Global Dependencies for Semantic Code Search. IEEE Transactions on Software Engineering (2023).
  33. AST-Probe: Recovering abstract syntax trees from hidden representations of pre-trained language models. arXiv preprint arXiv:2206.11719 (2022).
  34. ReACC: A retrieval-augmented code completion framework. arXiv preprint arXiv:2203.07722 (2022).
  35. James H Lubowitz. 2023. ChatGPT, an artificial intelligence chatbot, is impacting medical literature. Arthroscopy 39, 5 (2023), 1121–1122.
  36. GraphCode2Vec: generic code embedding via lexical and program dependence analyses. In Proceedings of the 19th International Conference on Mining Software Repositories. 524–536.
  37. Is Self-Attention Powerful to Learn Code Syntax and Semantics? arXiv preprint arXiv:2212.10017 (2022).
  38. An Empirical Study of Static Call Graph Extractors. ACM Trans. Softw. Eng. Methodol. 7, 2 (apr 1998), 158–191. https://doi.org/10.1145/279310.279314
  39. SPT-Code: Sequence-to-Sequence Pre-Training for Learning Source Code Representations. In Proceedings of the 44th International Conference on Software Engineering (Pittsburgh, Pennsylvania) (ICSE ’22). Association for Computing Machinery, New York, NY, USA, 2006–2018. https://doi.org/10.1145/3510003.3510096
  40. OpenAI. 2019. ChatGPT Demo. https://www.youtube.com/watch?v=outcGtbnMuQ&ab_channel=OpenAI.
  41. OpenAI. 2023. GPT-4 Technical Report. arXiv (2023).
  42. Trivial compiler equivalence: A large scale empirical study of a simple, fast and effective equivalent mutant detection technique. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 1. IEEE, 936–946.
  43. Mutation testing advances: an analysis and survey. In Advances in Computers. Vol. 112. Elsevier, 275–378.
  44. Trex: Learning execution semantics from micro-traces for binary similarity. arXiv preprint arXiv:2012.08680 (2020).
  45. Towards making the most of chatgpt for machine translation. arXiv preprint arXiv:2303.13780 (2023).
  46. A Primer in BERTology: What We Know About How BERT Works. Transactions of the Association for Computational Linguistics 8 (01 2021), 842–866. https://doi.org/10.1162/tacl_a_00349 arXiv:https://direct.mit.edu/tacl/article-pdf/doi/10.1162/tacl_a_00349/1923281/tacl_a_00349.pdf
  47. A primer in BERTology: What we know about how BERT works. Transactions of the Association for Computational Linguistics 8 (2021), 842–866.
  48. Autoprompt: Eliciting knowledge from language models with automatically generated prompts. arXiv preprint arXiv:2010.15980 (2020).
  49. Pointer analysis. Foundations and Trends® in Programming Languages 2, 1 (2015), 1–69.
  50. An analysis of the automatic bug fixing performance of chatgpt. arXiv preprint arXiv:2301.08653 (2023).
  51. Learning to summarize with human feedback. Advances in Neural Information Processing Systems 33 (2020), 3008–3021.
  52. Is ChatGPT the Ultimate Programming Assistant–How far is it? arXiv preprint arXiv:2304.11938 (2023).
  53. Burak Turhan. 2012. On the Dataset Shift Problem in Software Engineering Prediction Models. Empirical Softw. Engg. 17, 1–2 (feb 2012), 62–74. https://doi.org/10.1007/s10664-011-9182-8
  54. How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (Beijing, China) (CIKM ’19). Association for Computing Machinery, New York, NY, USA, 1823–1832. https://doi.org/10.1145/3357384.3358028
  55. Lars van Hijfte and Ana Oprescu. 2021. MutantBench: an Equivalent Mutant Problem Comparison Framework. In 2021 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW). 7–12. https://doi.org/10.1109/ICSTW52544.2021.00015
  56. What do they capture? a structural analysis of pre-trained language models for source code. In Proceedings of the 44th International Conference on Software Engineering. 2377–2388.
  57. Cross-Lingual Summarization via ChatGPT. arXiv preprint arXiv:2302.14229 (2023).
  58. A Tree-structured Transformer for Program Representation Learning. arXiv preprint arXiv:2208.08643 (2022).
  59. Syncobert: Syntax-guided multi-modal contrastive pre-training for code representation. arXiv preprint arXiv:2108.04556 (2021).
  60. CODE-MVP: learning to represent source code from multiple views with contrastive pre-training. arXiv preprint arXiv:2205.02029 (2022).
  61. Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. arXiv preprint arXiv:2109.00859 (2021).
  62. Recursively summarizing books with human feedback. arXiv preprint arXiv:2109.10862 (2021).
  63. Chunqiu Steven Xia and Lingming Zhang. 2023. Keep the Conversation Going: Fixing 162 out of 337 bugs for $0.42 each using ChatGPT. arXiv preprint arXiv:2304.00385 (2023).
  64. Natural attack for pre-trained models of code. In Proceedings of the 44th International Conference on Software Engineering. 1482–1493.
  65. Neural Program Repair with Execution-Based Backpropagation. In Proceedings of the 44th International Conference on Software Engineering (Pittsburgh, Pennsylvania) (ICSE ’22). Association for Computing Machinery, New York, NY, USA, 1506–1518. https://doi.org/10.1145/3510003.3510222
  66. A Novel Neural Source Code Representation Based on Abstract Syntax Tree. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). 783–794. https://doi.org/10.1109/ICSE.2019.00086
  67. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. Advances in neural information processing systems 32 (2019).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Wei Ma (106 papers)
  2. Shangqing Liu (28 papers)
  3. Wenhan Wang (22 papers)
  4. Qiang Hu (149 papers)
  5. Ye Liu (153 papers)
  6. Cen Zhang (69 papers)
  7. Liming Nie (9 papers)
  8. Yang Liu (2253 papers)
  9. Zhihao Lin (16 papers)
  10. Li Li (655 papers)
Citations (5)