Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Automated Classification of Code Review Feedback to Support Analytics (2307.03852v1)

Published 7 Jul 2023 in cs.SE

Abstract: Background: As improving code review (CR) effectiveness is a priority for many software development organizations, projects have deployed CR analytics platforms to identify potential improvement areas. The number of issues identified, which is a crucial metric to measure CR effectiveness, can be misleading if all issues are placed in the same bin. Therefore, a finer-grained classification of issues identified during CRs can provide actionable insights to improve CR effectiveness. Although a recent work by Fregnan et al. proposed automated models to classify CR-induced changes, we have noticed two potential improvement areas -- i) classifying comments that do not induce changes and ii) using deep neural networks (DNN) in conjunction with code context to improve performances. Aims: This study aims to develop an automated CR comment classifier that leverages DNN models to achieve a more reliable performance than Fregnan et al. Method: Using a manually labeled dataset of 1,828 CR comments, we trained and evaluated supervised learning-based DNN models leveraging code context, comment text, and a set of code metrics to classify CR comments into one of the five high-level categories proposed by Turzo and Bosu. Results: Based on our 10-fold cross-validation-based evaluations of multiple combinations of tokenization approaches, we found a model using CodeBERT achieving the best accuracy of 59.3%. Our approach outperforms Fregnan et al.'s approach by achieving 18.7% higher accuracy. Conclusion: Besides facilitating improved CR analytics, our proposed model can be useful for developers in prioritizing code review feedback and selecting reviewers.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. P. C. Rigby, D. M. German, L. Cowen, and M.-A. Storey, “Peer review on open-source software projects: Parameters, statistical models, and theory,” ACM Transactions on Software Engineering and Methodology (TOSEM), vol. 23, no. 4, pp. 1–33, 2014.
  2. A. Bosu, J. C. Carver, C. Bird, J. Orbeck, and C. Chockley, “Process aspects and social dynamics of contemporary code review: Insights from open source development and industrial practice at microsoft,” IEEE Transactions on Software Engineering, vol. 43, no. 1, pp. 56–75, 2016.
  3. C. Sadowski, E. Söderberg, L. Church, M. Sipko, and A. Bacchelli, “Modern code review: a case study at google,” in Proceedings of the 40th international conference on software engineering: Software engineering in practice, 2018, pp. 181–190.
  4. P. C. Rigby and C. Bird, “Convergent contemporary software peer review practices,” in Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering.   ACM, 2013, pp. 202–212.
  5. P. Wurzel Gonçalves, G. Calikli, A. Serebrenik, and A. Bacchelli, “Competencies for code review,” Proceedings of the ACM on Human-Computer Interaction, vol. 7, no. CSCW1, pp. 1–33, 2023.
  6. A. Bacchelli and C. Bird, “Expectations, outcomes, and challenges of modern code review,” in 2013 35th International Conference on Software Engineering (ICSE).   IEEE, 2013, pp. 712–721.
  7. A. Bosu, M. Greiler, and C. Bird, “Characteristics of useful code reviews: An empirical study at microsoft,” in 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.   IEEE, 2015, pp. 146–156.
  8. A. K. Turzo and A. Bosu, “What makes a code review useful to opendev developers? an empirical investigation,” Empirical Software Engineeting, vol. 28, p. TBD, 2023.
  9. M. Beller, A. Bacchelli, A. Zaidman, and E. Juergens, “Modern code reviews in open-source projects: Which problems do they fix?” in Proceedings of the 11th working conference on mining software repositories, 2014, pp. 202–211.
  10. J. Czerwonka, M. Greiler, and J. Tilford, “Code reviews do not find bugs. how the current code review best practice slows us down,” in 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, vol. 2.   IEEE, 2015, pp. 27–28.
  11. M. Hasan, A. Iqbal, M. R. U. Islam, A. I. Rahman, and A. Bosu, “Using a balanced scorecard to identify opportunities to improve code review effectiveness: An industrial experience report,” Empirical Software Engineering, vol. 26, pp. 1–34, 2021.
  12. C. Bird, T. Carnahan, and M. Greiler, “Lessons learned from building and deploying a code review analytics platform,” in 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.   IEEE, 2015, pp. 191–201.
  13. E. Fregnan, F. Petrulio, L. Di Geronimo, and A. Bacchelli, “What happens in my code reviews? an investigation on automatically classifying review changes,” Empirical Software Engineering, vol. 27, no. 4, p. 89, 2022.
  14. Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang et al., “Codebert: A pre-trained model for programming and natural languages,” in Findings of the Association for Computational Linguistics: EMNLP 2020, 2020, pp. 1536–1547.
  15. X. Zhou, D. Han, and D. Lo, “Assessing generalizability of codebert,” in 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME).   IEEE, 2021, pp. 425–436.
  16. K. Liu, G. Yang, X. Chen, and Y. Zhou, “El-codebert: Better exploiting codebert to support source code-related classification tasks,” in Proceedings of the 13th Asia-Pacific Symposium on Internetware, 2022, pp. 147–155.
  17. O. Kononenko, O. Baysal, and M. W. Godfrey, “Code review quality: How developers see it,” in Proceedings of the 38th international conference on software engineering, 2016, pp. 1028–1038.
  18. P. Thongtanunam, S. McIntosh, A. E. Hassan, and H. Iida, “Revisiting code ownership and its relationship with software quality in the scope of modern code review,” in Proceedings of the 38th international conference on software engineering, 2016, pp. 1039–1050.
  19. A. Bosu and J. C. Carver, “Impact of developer reputation on code review outcomes in oss projects: An empirical investigation,” in Proceedings of the 8th ACM/IEEE international symposium on empirical software engineering and measurement, 2014, pp. 1–10.
  20. F. Ebert, F. Castor, N. Novielli, and A. Serebrenik, “An exploratory study on confusion in code reviews,” Empirical Software Engineering, vol. 26, pp. 1–48, 2021.
  21. P. Thongtanunam and A. E. Hassan, “Review dynamics and their impact on software quality,” IEEE Transactions on Software Engineering, vol. 47, no. 12, pp. 2698–2712, 2020.
  22. P. Thongtanunam, S. McIntosh, A. E. Hassan, and H. Iida, “Review participation in modern code review,” Empirical Software Engineering, vol. 22, no. 2, pp. 768–817, 2017.
  23. T. Hirao, A. Ihara, Y. Ueda, P. Phannachitta, and K.-i. Matsumoto, “The impact of a low level of agreement among reviewers in a code review process,” in IFIP International Conference on Open Source Systems.   Springer, 2016, pp. 97–110.
  24. M. Barnett, C. Bird, J. Brunet, and S. K. Lahiri, “Helping developers help themselves: Automatic decomposition of code review changesets,” in 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, vol. 1.   IEEE, 2015, pp. 134–144.
  25. M. Dias, A. Bacchelli, G. Gousios, D. Cassou, and S. Ducasse, “Untangling fine-grained code changes,” in 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).   IEEE, 2015, pp. 341–350.
  26. V. U. Gómez, S. Ducasse, and T. d’Hondt, “Visually characterizing source code changes,” Science of Computer Programming, vol. 98, pp. 376–393, 2015.
  27. Y. Huang, N. Jia, X. Chen, K. Hong, and Z. Zheng, “Code review knowledge perception: Fusing multi-features for salient-class location,” IEEE Transactions on Software Engineering, vol. 48, no. 5, pp. 1463–1479, 2020.
  28. M. B. Zanjani, H. Kagdi, and C. Bird, “Automatically recommending peer reviewers in modern code review,” IEEE Transactions on Software Engineering, vol. 42, no. 6, pp. 530–543, 2015.
  29. P. Thongtanunam, C. Tantithamthavorn, R. G. Kula, N. Yoshida, H. Iida, and K.-i. Matsumoto, “Who should review my code? a file location-based code-reviewer recommendation approach for modern code review,” in 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).   IEEE, 2015, pp. 141–150.
  30. M. M. Rahman, C. K. Roy, and J. A. Collins, “Correct: code reviewer recommendation in github based on cross-project and technology experience,” in Proceedings of the 38th international conference on software engineering companion, 2016, pp. 222–231.
  31. P. Pandya and S. Tiwari, “Corms: a github and gerrit based hybrid code reviewer recommendation approach for modern code review,” in Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2022, pp. 546–557.
  32. G. Rong, Y. Zhang, L. Yang, F. Zhang, H. Kuang, and H. Zhang, “Modeling review history for reviewer recommendation: a hypergraph approach,” in Proceedings of the 44th International Conference on Software Engineering, 2022, pp. 1381–1392.
  33. I. X. Gauthier, M. Lamothe, G. Mussbacher, and S. McIntosh, “Is historical data an appropriate benchmark for reviewer recommendation systems?: A case study of the gerrit community,” in 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE).   IEEE, 2021, pp. 30–41.
  34. P. Thongtanunam, C. Pornprasit, and C. Tantithamthavorn, “Autotransform: automated code transformation to support modern code review process,” in Proceedings of the 44th International Conference on Software Engineering, 2022, pp. 237–248.
  35. R. Tufano, L. Pascarella, M. Tufano, D. Poshyvanyk, and G. Bavota, “Towards automating code review activities,” in 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE).   IEEE, 2021, pp. 163–174.
  36. Z. Li, S. Lu, D. Guo, N. Duan, S. Jannu, G. Jenks, D. Majumder, J. Green, A. Svyatkovskiy, S. Fu, and N. Sundaresan, “Automating code review activities by large-scale pre-training,” in Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ser. ESEC/FSE 2022, 2022, p. 1035–1047.
  37. R. Tufano, S. Masiero, A. Mastropaolo, L. Pascarella, D. Poshyvanyk, and G. Bavota, “Using pre-trained models to boost code review automation,” in Proceedings of the 44th International Conference on Software Engineering, 2022, pp. 2291–2302.
  38. M. Tufano, J. Pantiuchina, C. Watson, G. Bavota, and D. Poshyvanyk, “On learning meaningful code changes via neural machine translation,” in 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).   IEEE, 2019, pp. 25–36.
  39. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  40. M. V. Mäntylä and C. Lassenius, “What types of defects are really discovered in code reviews?” IEEE Transactions on Software Engineering, vol. 35, no. 3, pp. 430–448, 2008.
  41. M. M. Rahman, C. K. Roy, and R. G. Kula, “Predicting usefulness of code review comments using textual features and developer experience,” in 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).   IEEE, 2017, pp. 215–226.
  42. F. E. Zanaty, T. Hirao, S. McIntosh, A. Ihara, and K. Matsumoto, “An empirical study of design discussions in code review,” in Proceedings of the 12th ACM/IEEE international symposium on empirical software engineering and measurement, 2018, pp. 1–10.
  43. T. Hirao, S. McIntosh, A. Ihara, and K. Matsumoto, “The review linkage graph for code review analytics: a recovery approach and empirical study,” in Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2019, pp. 578–589.
  44. M. Fowler, “Refactoring catalog,” Refactoring Home Page, URL: http://www. refactoring. com/catalog/index. html (letzter Abruf: 09.02. 2006), 2012.
  45. J. Cohen, “A coefficient of agreement for nominal scales,” Educational and psychological measurement, vol. 20, no. 1, pp. 37–46, 1960.
  46. J. R. Landis and G. G. Koch, “An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers,” Biometrics, pp. 363–374, 1977.
  47. T. J. McCabe, “A complexity measure,” IEEE Transactions on software Engineering, no. 4, pp. 308–320, 1976.
  48. U. Alon, M. Zilberstein, O. Levy, and E. Yahav, “code2vec: Learning distributed representations of code,” Proceedings of the ACM on Programming Languages, vol. 3, no. POPL, pp. 1–29, 2019.
  49. J. Zhang, X. Wang, H. Zhang, H. Sun, K. Wang, and X. Liu, “A novel neural source code representation based on abstract syntax tree,” in 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).   IEEE, 2019, pp. 783–794.
  50. M. White, M. Tufano, C. Vendome, and D. Poshyvanyk, “Deep learning code fragments for code clone detection,” in Proceedings of the 31st IEEE/ACM international conference on automated software engineering, 2016, pp. 87–98.
  51. B. Fluri, M. Wursch, M. PInzger, and H. Gall, “Change distilling: Tree differencing for fine-grained source code change extraction,” IEEE Transactions on software engineering, vol. 33, no. 11, pp. 725–743, 2007.
  52. T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. Le Scao, S. Gugger, M. Drame, Q. Lhoest, and A. Rush, “Transformers: State-of-the-art natural language processing,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations.   Online: Association for Computational Linguistics, Oct. 2020, pp. 38–45. [Online]. Available: https://aclanthology.org/2020.emnlp-demos.6
  53. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
  54. D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization 3rd int,” in Conf. for Learning Representations, San, 2014.
  55. Y. Reich and S. Barai, “Evaluating machine learning models for engineering problems,” Artificial Intelligence in Engineering, vol. 13, no. 3, pp. 257–272, 1999.
  56. J. D. M.-W. C. Kenton and L. K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of naacL-HLT, vol. 1, 2019, p. 2.
  57. E. Fregnan, F. Petrulio, L. D. Geronimo, and A. Bacchelli, “What happens in my code reviews? - Replication package,” Oct. 2021. [Online]. Available: https://doi.org/10.5281/zenodo.5592254
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Asif Kamal Turzo (9 papers)
  2. Fahim Faysal (1 paper)
  3. Ovi Poddar (1 paper)
  4. Jaydeb Sarker (6 papers)
  5. Anindya Iqbal (24 papers)
  6. Amiangshu Bosu (17 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.