Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LPR: Large Language Models-Aided Program Reduction (2312.13064v3)

Published 20 Dec 2023 in cs.PL and cs.SE

Abstract: Program reduction is a prevalent technique to facilitate compilers' debugging by automatically minimizing bug-triggering programs. Existing program reduction techniques are either generic across languages (e.g., Perses and Vulcan) or specifically customized for one certain language by employing language-specific features, like C-Reduce. However, striking the balance between generality across multiple programming languages and specificity to individual languages in program reduction is yet to be explored. This paper proposes LPR, the first technique utilizing LLMs to perform language-specific program reduction for multiple languages. The core insight is to utilize both the language-generic syntax level program reduction (e.g., Perses) and the language-specific semantic level program transformations learned by LLMs. Alternately, language-generic program reducers efficiently reduce programs into 1-tree-minimality, which is small enough to be manageable for LLMs; LLMs effectively transform programs via the learned semantics to expose new reduction opportunities for the language-generic program reducers to further reduce the programs. Our extensive evaluation on 50 benchmarks across three languages (C, Rust, and JavaScript) has highlighted LPR's practicality and superiority over Vulcan, the state-of-the-art language-generic program reducer. For effectiveness, LPR surpasses Vulcan by producing 24.93%, 4.47%, and 11.71% smaller programs on benchmarks in C, Rust and JavaScript. Moreover, LPR and Vulcan have demonstrated their potential to complement each other. By using Vulcan on LPR's output for C programs, we achieve program sizes comparable to those reduced by C-Reduce. For efficiency, LPR takes 10.77%, 34.88%, 36.96% less time than Vulcan to finish all benchmarks in C, Rust and JavaScript, separately.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. 2023a. OpenAI API. Retrieved 2023-11-20 from https://platform.openai.com/docs/overview
  2. 2023b. OpenAI API: N. Retrieved 2023-11-20 from https://platform.openai.com/docs/api-reference/chat/create#chat-create-n
  3. 2023c. OpenAI API: Temperature. Retrieved 2023-11-20 from https://platform.openai.com/docs/api-reference/chat/create#chat-create-temperature
  4. Does an lstm forget more than a cnn? an empirical study of catastrophic forgetting in nlp. In Proceedings of the The 17th Annual Workshop of the Australasian Language Technology Association. 77–86.
  5. Large language models are zero-shot fuzzers: Fuzzing deep-learning libraries via large language models. In Proceedings of the 32nd ACM SIGSOFT international symposium on software testing and analysis. 423–435.
  6. Large language models are edge-case fuzzers: Testing deep learning libraries via fuzzgpt. arXiv preprint arXiv:2304.02014 (2023).
  7. Alastair Donaldson and David MacIver. 2021. Test Case Reduction: Beyond Bugs. Retrieved May 29, 2023 from https://blog.sigplan.org/2021/05/25/test-case-reduction-beyond-bugs
  8. Test-case reduction and deduplication almost for free with transformation-based compiler testing. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. 1017–1032. https://doi.org/10.1145/3453483.3454092
  9. Automated Repair of Programs from Large Language Models. In Proceedings of the 45th International Conference on Software Engineering (Melbourne, Victoria, Australia) (ICSE ’23). IEEE Press, 1469–1481. https://doi.org/10.1109/ICSE48619.2023.00128
  10. An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211 (2013).
  11. Qiuhan Gu. 2023. LLM-Based Code Generation Method for Golang Compiler Testing. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023, San Francisco, CA, USA, December 3-9, 2023, Satish Chandra, Kelly Blincoe, and Paolo Tonella (Eds.). ACM, 2201–2203. https://doi.org/10.1145/3611643.3617850
  12. An Empirical Study on Fine-Tuning Large Language Models of Code for Automated Program Repair. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE Computer Society, 1162–1174.
  13. Jigsaw: Large Language Models meet Program Synthesis. In 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022. ACM, 1219–1231. https://doi.org/10.1145/3510003.3510203
  14. Christian Gram Kalhauge and Jens Palsberg. 2019. Binary reduction of dependency graphs. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 556–566. https://doi.org/10.1145/3338906.3338956
  15. Christian Gram Kalhauge and Jens Palsberg. 2021. Logical bytecode reduction. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. 1003–1016. https://doi.org/10.1145/3453483.3454091
  16. Measuring catastrophic forgetting in neural networks. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.
  17. Compiler validation via equivalence modulo inputs. ACM Sigplan Notices 49, 6 (2014), 216–226. https://doi.org/10.1145/2594291.2594334
  18. Program Reconditioning: Avoiding Undefined Behaviour When Finding and Reducing Compiler Bugs. Proc. ACM Program. Lang. 7, PLDI, Article 180 (jun 2023), 25 pages. https://doi.org/10.1145/3591294
  19. Nuances are the Key: Unlocking ChatGPT to Find Failure-Inducing Tests with Differential Prompting. In 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023, Luxembourg, September 11-15, 2023. IEEE, 14–26. https://doi.org/10.1109/ASE56229.2023.00089
  20. Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation. arXiv preprint arXiv:2305.01210 (2023).
  21. Random testing for C and C++ compilers with YARPGen. Proceedings of the ACM on Programming Languages 4, OOPSLA (2020), 1–25. https://doi.org/10.1145/1993498.1993532
  22. LLVM. 2000. LibTooling. https://clang.llvm.org/docs/LibTooling.html Accessed: 2023-04-30.
  23. Ghassan Misherghi and Zhendong Su. 2006. HDD: hierarchical delta debugging. In Proceedings of the 28th International Conference on Software Engineering. 142–151. https://doi.org/10.1145/1134285.1134307
  24. Aina Niemetz and Armin Biere. 2013. ddSMT: a delta debugger for the SMT-LIB v2 format. In Proceedings of the 11th International Workshop on Satisfiability Modulo Theories, SMT. 8–9.
  25. Test-case reduction for C compiler bugs. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation. 335–346. https://doi.org/10.1145/2254064.2254104
  26. John et al. Regehr. 2012. C-Reduce. Retrieved 2023-11-26 from https://github.com/csmith-project/creduce
  27. Large language models can be easily distracted by irrelevant context. In International Conference on Machine Learning. PMLR, 31210–31227.
  28. Finding compiler bugs via live code mutation. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications. 849–863. https://doi.org/10.1145/2983990.2984038
  29. Toward understanding compiler bugs in GCC and LLVM. In Proceedings of the 25th International Symposium on Software Testing and Analysis. 294–305. https://doi.org/10.1145/2931037.2931074
  30. Perses: Syntax-guided program reduction. In Proceedings of the 40th International Conference on Software Engineering. 361–371. https://doi.org/10.1145/3180155.3180236
  31. SMT Solver Validation Empowered by Large Pre-Trained Language Models. In 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023, Luxembourg, September 11-15, 2023. IEEE, 1288–1300. https://doi.org/10.1109/ASE56229.2023.00180
  32. Is ChatGPT the Ultimate Programming Assistant–How far is it? arXiv preprint arXiv:2304.11938 (2023).
  33. On the Caching Schemes to Speed Up Program Reduction. ACM Trans. Softw. Eng. Methodol. 1, 1 (January 2023), Article 1, 30 pages.
  34. On the Caching Schemes to Speed Up Program Reduction. ACM Trans. Softw. Eng. Methodol. 33, 1, Article 17 (nov 2023), 30 pages. https://doi.org/10.1145/3617172
  35. Probabilistic Delta debugging. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 881–892. https://doi.org/10.1145/3468264.3468625
  36. FuzzJIT: Oracle-Enhanced Fuzzing for JavaScript Engine JIT Compiler. In USENIX Security Symposium. USENIX.
  37. Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023, San Francisco, CA, USA, December 3-9, 2023, Satish Chandra, Kelly Blincoe, and Paolo Tonella (Eds.). ACM, 172–184. https://doi.org/10.1145/3611643.3616271
  38. How Effective Are Neural Networks for Fixing Security Vulnerabilities. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (¡conf-loc¿, ¡city¿Seattle¡/city¿, ¡state¿WA¡/state¿, ¡country¿USA¡/country¿, ¡/conf-loc¿) (ISSTA 2023). Association for Computing Machinery, New York, NY, USA, 1282–1294. https://doi.org/10.1145/3597926.3598135
  39. Revisiting the Plastic Surgery Hypothesis via Large Language Models. arXiv preprint arXiv:2303.10494 (2023).
  40. Automated program repair in the era of large pre-trained language models. In Proceedings of the 45th International Conference on Software Engineering (ICSE 2023). Association for Computing Machinery.
  41. Chunqiu Steven Xia and Lingming Zhang. 2023. Keep the Conversation Going: Fixing 162 out of 337 bugs for $0.42 each using ChatGPT. arXiv preprint arXiv:2304.00385 (2023).
  42. Pushing the Limit of 1-Minimality of Language-Agnostic Program Reduction. Proceedings of the ACM on Programming Languages 7, OOPSLA1 (2023), 636–664. https://doi.org/10.1145/3586049
  43. Finding and understanding bugs in C compilers. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation. 283–294.
  44. Andreas Zeller and Ralf Hildebrandt. 2002. Simplifying and isolating failure-inducing input. IEEE Transactions on Software Engineering 28, 2 (2002), 183–200. https://doi.org/10.1109/32.988498
  45. PPR: Pairwise Program Reduction. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 338–349.
  46. Li Zhong and Zilong Wang. 2023. A study on robustness and reliability of large language model code generation. arXiv preprint arXiv:2308.10335 (2023).
Citations (4)

Summary

We haven't generated a summary for this paper yet.