Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Enhancing Binary Code Comment Quality Classification: Integrating Generative AI for Improved Accuracy (2310.11467v1)

Published 14 Oct 2023 in cs.SE, cs.AI, and cs.LG

Abstract: This report focuses on enhancing a binary code comment quality classification model by integrating generated code and comment pairs, to improve model accuracy. The dataset comprises 9048 pairs of code and comments written in the C programming language, each annotated as "Useful" or "Not Useful." Additionally, code and comment pairs are generated using a LLM Architecture, and these generated pairs are labeled to indicate their utility. The outcome of this effort consists of two classification models: one utilizing the original dataset and another incorporating the augmented dataset with the newly generated code comment pairs and labels.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)
  1. P. Rani, Speculative analysis for quality assessment of code comments, in: 2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), IEEE, 2021, pp. 299–303.
  2. A survey on research of code comment, in: Proceedings of the 2019 3rd International Conference on Management Engineering, Software Engineering and Service Sciences, 2019, pp. 45–51.
  3. A. Bacchelli, C. Bird, Expectations, outcomes, and challenges of modern code review, in: 2013 35th International Conference on Software Engineering (ICSE), IEEE, 2013, pp. 712–721.
  4. Quality analysis of source code comments, in: 2013 21st international conference on program comprehension (icpc), Ieee, 2013, pp. 83–92.
  5. Deep code-comment understanding and assessment, IEEE Access 7 (2019) 174200–174209.
  6. Generative ai for software metadata: Overview of the information retrieval in software engineering track at fire 2023, in: Forum for Information Retrieval Evaluation, ACM, 2023.
  7. Comment-mine—a semantic search approach to program comprehension from code comments, Advanced Computing and Systems for Security: Volume Twelve (2020) 29–42.
  8. Automated evaluation of comments to aid software maintenance, Journal of Software: Evolution and Process 34 (2022a) e2463.
  9. An effective low-dimensional software code representation using bert and elmo, in: 2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS), IEEE, 2022b, pp. 763–774.
  10. Does bert understand code?–an exploratory study on the detection of architectural tactics in code, in: European Conference on Software Architecture, Springer, 2020, pp. 220–228.
  11. J. R. Quinlan, Learning decision tree classifiers, ACM Computing Surveys (CSUR) 28 (1996) 71–72.
  12. Artificial neural networks: A tutorial, Computer 29 (1996) 31–44.
  13. A tutorial on ν𝜈\nuitalic_ν-support vector machines, Applied Stochastic Models in Business and Industry 21 (2005) 111–136.
  14. Unsupervised random forest: a tutorial with case studies, journal of Chemometrics 30 (2016) 232–241.
  15. A. Natekin, A. Knoll, Gradient boosting machines, a tutorial, Frontiers in neurorobotics 7 (2013) 21.
  16. A. DeMaris, A tutorial in logistic regression, Journal of Marriage and the Family (1995) 956–968.
  17. C. Haruechaiyasak, A tutorial on naive bayes classification, Last update 16 (2008).
  18. P. Cunningham, S. J. Delany, k-nearest neighbour classifiers-a tutorial, ACM computing surveys (CSUR) 54 (2021) 1–25.
  19. G. Chen, A gentle tutorial of recurrent neural network with error backpropagation, arXiv preprint arXiv:1610.02583 (2016).

Summary

We haven't generated a summary for this paper yet.