Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CigaR: Cost-efficient Program Repair with LLMs (2402.06598v2)

Published 9 Feb 2024 in cs.SE and cs.LG

Abstract: LLMs (LLM) have proven to be effective at automated program repair (APR). However, using LLMs can be costly, with companies invoicing users by the number of tokens. In this paper, we propose CigaR, the first LLM-based APR tool that focuses on minimizing the repair cost. CigaR works in two major steps: generating a first plausible patch and multiplying plausible patches. CigaR optimizes the prompts and the prompt setting to maximize the information given to LLMs using the smallest possible number of tokens. Our experiments on 429 bugs from the widely used Defects4J and HumanEval-Java datasets shows that CigaR reduces the token cost by 73%. On average, CigaR spends 127k tokens per bug while the baseline uses 467k tokens per bug. On the subset of bugs that are fixed by both, CigaR spends 20k per bug while the baseline uses 608k tokens, a cost saving of 96%. Our extensive experiments show that CigaR is a cost-effective LLM-based program repair tool that uses a low number of tokens to automatically generate patches.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. M. Monperrus, “Automatic software repair: A bibliography,” ACM Computing Surveys (CSUR), vol. 51, no. 1, pp. 1–24, 2018.
  2. C. Le Goues, M. Dewey-Vogt, S. Forrest, and W. Weimer, “A Systematic Study of Automated Program Repair: Fixing 55 Out of 105 Bugs for $8 Each,” in Proceedings of the International Conference on Software Engineering, 2012, pp. 3–13.
  3. C. S. Xia and L. Zhang, “Keep the conversation going: Fixing 162 out of 337 bugs for $0.42 each using chatgpt,” arXiv preprint arXiv:2304.00385, 2023.
  4. N. Jiang, K. Liu, T. Lutellier, and L. Tan, “Impact of code language models on automated program repair,” in 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE).   Los Alamitos, CA, USA: IEEE Computer Society, may 2023, pp. 1430–1442. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/ICSE48619.2023.00125
  5. L. Chen, M. Zaharia, and J. Zou, “Frugalgpt: How to use large language models while reducing cost and improving performance,” arXiv preprint arXiv:2305.05176, 2023.
  6. OpenAI. (2023) Pricing. [Online]. Available: https://openai.com/pricing
  7. Q. Zhang, T. Zhang, J. Zhai, C. Fang, B. Yu, W. Sun, and Z. Chen, “A critical review of large language model on software engineering: An example from chatgpt and automated program repair,” arXiv preprint arXiv:2310.08879, 2023.
  8. R. Just, D. Jalali, and M. D. Ernst, “Defects4j: A database of existing faults to enable controlled testing studies for java programs,” in Proceedings of the 2014 international symposium on software testing and analysis, 2014, pp. 437–440.
  9. A. S. Luccioni, S. Viguier, and A.-L. Ligozat, “Estimating the carbon footprint of bloom, a 176b parameter language model,” Journal of Machine Learning Research, vol. 24, no. 253, pp. 1–15, 2023.
  10. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  11. A. J. Thirunavukarasu, D. S. J. Ting, K. Elangovan, L. Gutierrez, T. F. Tan, and D. S. W. Ting, “Large language models in medicine,” Nature medicine, vol. 29, no. 8, pp. 1930–1940, 2023.
  12. M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. d. O. Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman et al., “Evaluating large language models trained on code,” arXiv preprint arXiv:2107.03374, 2021.
  13. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
  14. M. Zhang and J. Li, “A commentary of gpt-3 in mit technology review 2021,” Fundamental Research, vol. 1, no. 6, pp. 831–833, 2021.
  15. L. K. Umapathi, A. Pal, and M. Sankarasubbu, “Med-halt: Medical domain hallucination test for large language models,” arXiv preprint arXiv:2307.15343, 2023.
  16. C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,” The Journal of Machine Learning Research, vol. 21, no. 1, pp. 5485–5551, 2020.
  17. A. Roberts, C. Raffel, and N. Shazeer, “How much knowledge can you pack into the parameters of a language model?” arXiv preprint arXiv:2002.08910, 2020.
  18. Z. Cheng, J. Kasai, and T. Yu, “Batch prompting: Efficient inference with large language model apis,” arXiv preprint arXiv:2301.08721, 2023.
  19. “OpenAI pricing,” https://openai.com/pricing, accessed: 2024-04-05.
  20. C. Wang, S. X. Liu, and A. H. Awadallah, “Cost-effective hyperparameter optimization for large language model generation inference,” arXiv preprint arXiv:2303.04673, 2023.
  21. T. X. Olausson, J. P. Inala, C. Wang, J. Gao, and A. Solar-Lezama, “Demystifying gpt self-repair for code generation,” arXiv preprint arXiv:2306.09896, 2023.
  22. J. Oppenlaender and J. Hämäläinen, “Mapping the challenges of hci: An application and evaluation of chatgpt and gpt-4 for cost-efficient question answering,” arXiv preprint arXiv:2306.05036, 2023.
  23. H. Ye, J. Gu, M. Martinez, T. Durieux, and M. Monperrus, “Automated classification of overfitting patches with statically extracted code features,” IEEE Transactions on Software Engineering, vol. 48, no. 8, pp. 2920–2938, 2021.
  24. A. Ghanbari and A. Marcus, “Patch correctness assessment in automated program repair based on the impact of patches on production and test code,” in Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, 2022, pp. 654–665.
  25. S. Fakhoury, S. Chakraborty, M. Musuvathi, and S. K. Lahiri, “Towards generating functionally correct code edits from natural language issue descriptions,” arXiv preprint arXiv:2304.03816, 2023.
  26. Y. Zhang, Z. Jin, Y. Xing, and G. Li, “Steam: Simulating the interactive behavior of programmers for automatic bug fixing,” arXiv preprint arXiv:2308.14460, 2023.
  27. K. Huang, Z. Xu, S. Yang, H. Sun, X. Li, Z. Yan, and Y. Zhang, “A survey on automated program repair techniques,” arXiv preprint arXiv:2303.18184, 2023.
  28. C. S. Xia and L. Zhang, “Less training, more repairing please: revisiting automated program repair via zero-shot learning,” in Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2022, pp. 959–971.
  29. Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang et al., “Codebert: A pre-trained model for programming and natural languages,” arXiv preprint arXiv:2002.08155, 2020.
  30. C. H. Lampert, H. Nickisch, and S. Harmeling, “Learning to detect unseen object classes by between-class attribute transfer,” in 2009 IEEE conference on computer vision and pattern recognition.   IEEE, 2009, pp. 951–958.
  31. Q. Zhang, C. Fang, T. Zhang, B. Yu, W. Sun, and Z. Chen, “Gamma: Revisiting template-based automated program repair via mask prediction,” arXiv preprint arXiv:2309.09308, 2023.
  32. H. Pearce, B. Tan, B. Ahmad, R. Karri, and B. Dolan-Gavitt, “Examining zero-shot vulnerability repair with large language models,” in 2023 IEEE Symposium on Security and Privacy (SP).   IEEE, 2023, pp. 2339–2356.
  33. R. Paul, M. M. Hossain, M. L. Siddiq, M. Hasan, A. Iqbal, and J. C. S. Santos, “Enhancing automated program repair through fine-tuning and prompt engineering,” 2023.
  34. J. Cao, M. Li, M. Wen, and S.-c. Cheung, “A study on prompt design, advantages and limitations of chatgpt for deep learning program repair,” arXiv preprint arXiv:2304.08191, 2023.
  35. C. S. Xia, Y. Wei, and L. Zhang, “Automated program repair in the era of large pre-trained language models,” in Proceedings of the 45th International Conference on Software Engineering (ICSE 2023). Association for Computing Machinery, 2023.
  36. N. Nashid, M. Sintaha, and A. Mesbah, “Retrieval-based prompt selection for code-related few-shot learning,” in Proceedings of the 45th International Conference on Software Engineering (ICSE’23), 2023.
  37. N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using siamese bert-networks,” arXiv preprint arXiv:1908.10084, 2019.
  38. S. Robertson, H. Zaragoza et al., “The probabilistic relevance framework: Bm25 and beyond,” Foundations and Trends® in Information Retrieval, vol. 3, no. 4, pp. 333–389, 2009.
  39. T. Ahmed and P. Devanbu, “Better patching using llm prompting, via self-consistency,” in 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE).   IEEE Computer Society, 2023, pp. 1742–1746.
  40. S. Chakraborty and B. Ray, “On multi-modal learning of editing source code,” in 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE).   IEEE, 2021, pp. 443–455.
  41. W. Yuan, Q. Zhang, T. He, C. Fang, N. Q. V. Hung, X. Hao, and H. Yin, “Circle: Continual repair across programming languages,” in Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, 2022, pp. 678–690.
  42. D. Sobania, M. Briesch, C. Hanna, and J. Petke, “An analysis of the automatic bug fixing performance of chatgpt,” arXiv preprint arXiv:2301.08653, 2023.
  43. Z. Fan, X. Gao, M. Mirchev, A. Roychoudhury, and S. H. Tan, “Automated repair of programs from large language models,” in 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE).   IEEE, 2023, pp. 1469–1481.
  44. C. Liu, P. Cetin, Y. Patodia, S. Chakraborty, Y. Ding, and B. Ray, “Automated code editing with search-generate-modify,” arXiv preprint arXiv:2306.06490, 2023.
  45. M. M. A. Haque, W. U. Ahmad, I. Lourentzou, and C. Brown, “Fixeval: Execution-based evaluation of program fixes for programming problems,” in 2023 IEEE/ACM International Workshop on Automated Program Repair (APR).   IEEE, 2023, pp. 11–18.
  46. Y. Wu, N. Jiang, H. V. Pham, T. Lutellier, J. Davis, L. Tan, P. Babkin, and S. Shah, “How effective are neural networks for fixing security vulnerabilities,” arXiv preprint arXiv:2305.18607, 2023.
  47. Q. Zhang, C. Fang, B. Yu, W. Sun, T. Zhang, and Z. Chen, “Pre-trained model-based automated software vulnerability repair: How far are we?” IEEE Transactions on Dependable and Secure Computing, 2023.
  48. E. Mashhadi and H. Hemmati, “Applying codebert for automated program repair of java simple bugs,” in 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR).   IEEE, 2021, pp. 505–509.
  49. A. Zirak and H. Hemmati, “Improving automated program repair with domain adaptation,” ACM Transactions on Software Engineering and Methodology, 2022.
  50. D. Fried, A. Aghajanyan, J. Lin, S. Wang, E. Wallace, F. Shi, R. Zhong, W.-t. Yih, L. Zettlemoyer, and M. Lewis, “Incoder: A generative model for code infilling and synthesis,” arXiv preprint arXiv:2204.05999, 2022.
  51. K. Huang, X. Meng, J. Zhang, Y. Liu, W. Wang, S. Li, and Y. Zhang, “An empirical study on fine-tuning large language models of code for automated program repair,” in 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE).   IEEE Computer Society, 2023, pp. 1162–1174.
  52. J. Zhang, R. Krishna, A. H. Awadallah, and C. Wang, “Ecoassistant: Using llm assistant more affordably and accurately,” arXiv preprint arXiv:2310.03046, 2023.
  53. M. A. Arefeen, B. Debnath, and S. Chakradhar, “Leancontext: Cost-efficient domain-specific question answering using llms,” arXiv preprint arXiv:2309.00841, 2023.
  54. J. Mu, X. L. Li, and N. Goodman, “Learning to compress prompts with gist tokens,” arXiv preprint arXiv:2304.08467, 2023.
  55. J. Lin, M. Diesendruck, L. Du, and R. Abraham, “Batchprompt: Accomplish more with less,” arXiv preprint arXiv:2309.00384, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Dávid Hidvégi (1 paper)
  2. Khashayar Etemadi (12 papers)
  3. Sofia Bobadilla (6 papers)
  4. Martin Monperrus (155 papers)
Citations (12)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com