Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Peer-aided Repairer: Empowering Large Language Models to Repair Advanced Student Assignments (2404.01754v1)

Published 2 Apr 2024 in cs.SE and cs.AI

Abstract: Automated generation of feedback on programming assignments holds significant benefits for programming education, especially when it comes to advanced assignments. Automated Program Repair techniques, especially LLM based approaches, have gained notable recognition for their potential to fix introductory assignments. However, the programs used for evaluation are relatively simple. It remains unclear how existing approaches perform in repairing programs from higher-level programming courses. To address these limitations, we curate a new advanced student assignment dataset named Defects4DS from a higher-level programming course. Subsequently, we identify the challenges related to fixing bugs in advanced assignments. Based on the analysis, we develop a framework called PaR that is powered by the LLM. PaR works in three phases: Peer Solution Selection, Multi-Source Prompt Generation, and Program Repair. Peer Solution Selection identifies the closely related peer programs based on lexical, semantic, and syntactic criteria. Then Multi-Source Prompt Generation adeptly combines multiple sources of information to create a comprehensive and informative prompt for the last Program Repair stage. The evaluation on Defects4DS and another well-investigated ITSP dataset reveals that PaR achieves a new state-of-the-art performance, demonstrating impressive improvements of 19.94% and 15.2% in repair rate compared to prior state-of-the-art LLM- and symbolic-based approaches, respectively

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Verifix: Verified Repair of Programming Assignments. ACM Transactions on Software Engineering and Methodology (TOSEM) 31 (2021), 1 – 31.
  2. Neural code comprehension: A learnable representation of code semantics. Advances in Neural Information Processing Systems 31 (2018).
  3. Gpt-neox-20b: An open-source autoregressive language model. arXiv preprint arXiv:2204.06745 (2022).
  4. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html
  5. A study on prompt design, advantages and limitations of chatgpt for deep learning program repair. arXiv preprint arXiv:2304.08191 (2023).
  6. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).
  7. Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In International conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 337–340.
  8. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
  9. Self-collaboration Code Generation via ChatGPT. arXiv preprint arXiv:2304.07590 (2023).
  10. Automated repair of programs from large language models. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1469–1481.
  11. Incoder: A generative model for code infilling and synthesis. arXiv preprint arXiv:2204.05999 (2022).
  12. Automatic Software Repair: A Survey. 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE) (2018), 1219–1219.
  13. Automated program repair. Commun. ACM 62 (2019), 56 – 65.
  14. Automated clustering and program repair for introductory programming assignments. ACM SIGPLAN Notices 53, 4 (2018), 465–480.
  15. UniXcoder: Unified Cross-Modal Pre-training for Code Representation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 7212–7225.
  16. Deepfix: Fixing common c language errors by deep learning. In Proceedings of the aaai conference on artificial intelligence, Vol. 31.
  17. Re-factoring based program repair applied to programming assignments. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 388–398.
  18. Re-factoring based Program Repair applied to Programming Assignments. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE/ACM, 388–398.
  19. Impact of code language models on automated program repair. arXiv preprint arXiv:2302.05020 (2023).
  20. Cure: Code-aware neural machine translation for automatic program repair. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 1161–1173.
  21. Repair is nearly generation: Multilingual program repair with llms. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 5131–5140.
  22. Defects4J: A database of existing faults to enable controlled testing studies for Java programs. In Proceedings of the 2014 international symposium on software testing and analysis. 437–440.
  23. JFIX: semantics-based repair of Java programs via symbolic PathFinder. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis. 376–379.
  24. The ManyBugs and IntroClass benchmarks for automated repair of C programs. IEEE Transactions on Software Engineering 41, 12 (2015), 1236–1256.
  25. Genprog: A generic method for automatic software repair. Ieee transactions on software engineering 38, 1 (2011), 54–72.
  26. StarCoder: may the source be with you! arXiv preprint arXiv:2305.06161 (2023).
  27. QuixBugs: A multi-lingual program repair benchmark set based on the Quixey Challenge. In Proceedings Companion of the 2017 ACM SIGPLAN international conference on systems, programming, languages, and applications: software for humanity. 55–56.
  28. TBar: Revisiting template-based automated program repair. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 31–42.
  29. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. Comput. Surveys 55, 9 (2023), 1–35.
  30. Ken Masters. 2011. A Brief Guide To Understanding MOOCs.
  31. Angelix: Scalable multiline program patch synthesis via symbolic analysis. In Proceedings of the 38th international conference on software engineering. 691–701.
  32. Semfix: Program repair via semantic analysis. In 2013 35th International Conference on Software Engineering (ICSE). IEEE, 772–781.
  33. Codegen: An open large language model for code with multi-turn program synthesis. arXiv preprint arXiv:2203.13474 (2022).
  34. OpenAI. 2022. ChatGPT: Optimizing Language Models for Dialogue. https://openai.com/blog/chatgpt.
  35. sk_p: a neural program corrector for MOOCs. In Companion Proceedings of the 2016 ACM SIGPLAN International Conference on Systems, Programming, Languages and Applications: Software for Humanity. 39–40.
  36. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 21 (2020), 140:1–140:67.
  37. Codebleu: a method for automatic evaluation of code synthesis. arXiv preprint arXiv:2009.10297 (2020).
  38. Learning syntactic program transformations from examples. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). IEEE, 404–415.
  39. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950 (2023).
  40. Automated feedback generation for introductory programming assignments. In Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation. 15–26.
  41. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).
  42. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).
  43. Improvements to BM25 and language models examined. In Proceedings of the 19th Australasian Document Computing Symposium. 58–65.
  44. Attention is all you need. Advances in neural information processing systems 30 (2017).
  45. Ben Wang and Aran Komatsuzaki. 2021. GPT-J-6B: A 6 billion parameter autoregressive language model.
  46. Search, align, and repair: data-driven feedback generation for introductory programming exercises. In Proceedings of the 39th ACM SIGPLAN conference on programming language design and implementation. 481–495.
  47. Practical program repair in the era of large pre-trained language models. arXiv preprint arXiv:2210.14179 (2022).
  48. Chunqiu Steven Xia and Lingming Zhang. 2023. Keep the Conversation Going: Fixing 162 out of 337 bugs for $0.42 each using ChatGPT. arXiv preprint arXiv:2304.00385 (2023).
  49. Michihiro Yasunaga and Percy Liang. 2021. Break-it-fix-it: Unsupervised learning for program repair. In International Conference on Machine Learning. PMLR, 11941–11952.
  50. A feasibility study of using automated program repair for introductory programming assignments. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. 740–751.
  51. No More Manual Tests? Evaluating and Improving ChatGPT for Unit Test Generation. arXiv preprint arXiv:2305.04207 (2023).
  52. Repairing bugs in python assignments using large language models. arXiv preprint arXiv:2209.14876 (2022).
  53. Xlcost: A benchmark dataset for cross-lingual code intelligence. arXiv preprint arXiv:2206.08474 (2022).
  54. A syntax-guided edit decoder for neural program repair. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 341–353.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Qianhui Zhao (3 papers)
  2. Fang Liu (801 papers)
  3. Li Zhang (693 papers)
  4. Yang Liu (2253 papers)
  5. Zhen Yan (69 papers)
  6. Zhenghao Chen (30 papers)
  7. Yufei Zhou (9 papers)
  8. Jing Jiang (192 papers)
  9. Ge Li (213 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com