2000 character limit reached
Enhancing Binary Code Comment Quality Classification: Integrating Generative AI for Improved Accuracy (2310.11467v1)
Published 14 Oct 2023 in cs.SE, cs.AI, and cs.LG
Abstract: This report focuses on enhancing a binary code comment quality classification model by integrating generated code and comment pairs, to improve model accuracy. The dataset comprises 9048 pairs of code and comments written in the C programming language, each annotated as "Useful" or "Not Useful." Additionally, code and comment pairs are generated using a LLM Architecture, and these generated pairs are labeled to indicate their utility. The outcome of this effort consists of two classification models: one utilizing the original dataset and another incorporating the augmented dataset with the newly generated code comment pairs and labels.
- P. Rani, Speculative analysis for quality assessment of code comments, in: 2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), IEEE, 2021, pp. 299–303.
- A survey on research of code comment, in: Proceedings of the 2019 3rd International Conference on Management Engineering, Software Engineering and Service Sciences, 2019, pp. 45–51.
- A. Bacchelli, C. Bird, Expectations, outcomes, and challenges of modern code review, in: 2013 35th International Conference on Software Engineering (ICSE), IEEE, 2013, pp. 712–721.
- Quality analysis of source code comments, in: 2013 21st international conference on program comprehension (icpc), Ieee, 2013, pp. 83–92.
- Deep code-comment understanding and assessment, IEEE Access 7 (2019) 174200–174209.
- Generative ai for software metadata: Overview of the information retrieval in software engineering track at fire 2023, in: Forum for Information Retrieval Evaluation, ACM, 2023.
- Comment-mine—a semantic search approach to program comprehension from code comments, Advanced Computing and Systems for Security: Volume Twelve (2020) 29–42.
- Automated evaluation of comments to aid software maintenance, Journal of Software: Evolution and Process 34 (2022a) e2463.
- An effective low-dimensional software code representation using bert and elmo, in: 2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS), IEEE, 2022b, pp. 763–774.
- Does bert understand code?–an exploratory study on the detection of architectural tactics in code, in: European Conference on Software Architecture, Springer, 2020, pp. 220–228.
- J. R. Quinlan, Learning decision tree classifiers, ACM Computing Surveys (CSUR) 28 (1996) 71–72.
- Artificial neural networks: A tutorial, Computer 29 (1996) 31–44.
- A tutorial on ν𝜈\nuitalic_ν-support vector machines, Applied Stochastic Models in Business and Industry 21 (2005) 111–136.
- Unsupervised random forest: a tutorial with case studies, journal of Chemometrics 30 (2016) 232–241.
- A. Natekin, A. Knoll, Gradient boosting machines, a tutorial, Frontiers in neurorobotics 7 (2013) 21.
- A. DeMaris, A tutorial in logistic regression, Journal of Marriage and the Family (1995) 956–968.
- C. Haruechaiyasak, A tutorial on naive bayes classification, Last update 16 (2008).
- P. Cunningham, S. J. Delany, k-nearest neighbour classifiers-a tutorial, ACM computing surveys (CSUR) 54 (2021) 1–25.
- G. Chen, A gentle tutorial of recurrent neural network with error backpropagation, arXiv preprint arXiv:1610.02583 (2016).