Program Decomposition and Translation with Static Analysis (2401.12412v1)
Abstract: The rising popularity of LLMs has motivated exploring their use in code-related tasks. Code LLMs with more than millions of parameters are trained on a massive amount of code in different Programming Languages (PLs). Such models are used for automating various Software Engineering (SE) tasks using prompt engineering. However, given the very large size of industry-scale project files, a major issue of these LLMs is their limited context window size, motivating the question of "Can these LLMs process very large files and can we effectively perform prompt engineering?". Code translation aims to convert source code from one PL to another. In this work, we assess the effect of method-level program decomposition on context window of LLMs and investigate how this approach can enable translation of very large files which originally could not be done due to out-of-context issue. Our observations from 20 well-known java projects and approximately 60K methods suggest that method-level program decomposition significantly improves the limited context window problem of LLMs by 99.5%. Furthermore, our empirical analysis indicate that with method-level decomposition, each input fragment on average only consumes 5% of the context window, leaving more context space for prompt engineering and the output. Finally, we investigate the effectiveness of a Call Graph (CG) approach for translating very large files when doing method-level program decomposition.
- Open AI. 2023. Open AI ChatGPT. https://openai.com/blog/chatgpt
- SantaCoder: don’t reach for the stars! arXiv preprint arXiv:2301.03988 (2023).
- Ranking LLM-Generated Loop Invariants for Program Verification. arXiv preprint arXiv:2310.09342 (2023).
- Cursor. 2023. Cursor Code Editor. https://cursor.sh/
- Toga: A neural method for test oracle generation. In Proceedings of the 44th International Conference on Software Engineering. 2130–2141.
- The Apache Software Foundation. 2023a. Apache. https://github.com/apache
- The Apache Software Foundation. 2023b. Apache Commons CLI. https://github.com/apache/commons-cli
- Incoder: A generative model for code infilling and synthesis. arXiv preprint arXiv:2204.05999 (2022).
- GitHub. 2023a. CodeQL. https://codeql.github.com/
- GitHub. 2023b. GitHub Copilot. https://github.com/features/copilot
- Google. 2023a. Google Bard. https://bard.google.com/
- Google. 2023b. Google PaLM. https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html
- Interprocedural Slicing Using Dependence Graphs. In Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language Design and Implementation (Atlanta, Georgia, USA) (PLDI ’88). Association for Computing Machinery, New York, NY, USA, 35–46. https://doi.org/10.1145/53990.53994
- Automated Bug Generation in the era of Large Language Models. arXiv preprint arXiv:2310.02407 (2023).
- Perfect is the enemy of test oracle. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 70–81.
- StarCoder: may the source be with you! arXiv preprint arXiv:2305.06161 (2023).
- Yang Liu. 2019. Fine-tune BERT for extractive summarization. arXiv preprint arXiv:1903.10318 (2019).
- Codegen: An open large language model for code with multi-turn program synthesis. arXiv preprint arXiv:2203.13474 (2022).
- Lost in Translation: A Study of Bugs Introduced by Large Language Models while Translating Code. arXiv preprint arXiv:2308.03109 (2024).
- Towards Causal Deep Learning for Vulnerability Detection. arXiv preprint arXiv:2310.07958 (2023).
- Sequence to sequence learning with neural networks. Advances in neural information processing systems 27 (2014).
- LeTI: Learning to Generate from Textual Interactions. arXiv preprint arXiv:2305.10314 (2023).
- Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair. arXiv preprint arXiv:2309.00608 (2023).
- Mark Weiser. 1984. Program Slicing. IEEE Transactions on Software Engineering SE-10, 4 (1984), 352–357. https://doi.org/10.1109/TSE.1984.5010248
- A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023).
- Universal fuzzing via large language models. arXiv preprint arXiv:2308.04748 (2023).
- Chunqiu Steven Xia and Lingming Zhang. 2022. Less training, more repairing please: revisiting automated program repair via zero-shot learning. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 959–971.
- Chunqiu Steven Xia and Lingming Zhang. 2023. Conversational automated program repair. arXiv preprint arXiv:2301.13246 (2023).
- Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems 32 (2019).
- Prompt Engineering a Prompt Engineer. arXiv:2311.05661 [cs.CL]
- Ali Reza Ibrahimzada (6 papers)