Are your comments outdated? Towards automatically detecting code-comment consistency (2403.00251v1)
Abstract: In software development and maintenance, code comments can help developers understand source code, and improve communication among developers. However, developers sometimes neglect to update the corresponding comment when changing the code, resulting in outdated comments (i.e., inconsistent codes and comments). Outdated comments are dangerous and harmful and may mislead subsequent developers. More seriously, the outdated comments may lead to a fatal flaw sometime in the future. To automatically identify the outdated comments in source code, we proposed a learning-based method, called CoCC, to detect the consistency between code and comment. To efficiently identify outdated comments, we extract multiple features from both codes and comments before and after they change. Besides, we also consider the relation between code and comment in our model. Experiment results show that CoCC can effectively detect outdated comments with precision over 90%. In addition, we have identified the 15 most important factors that cause outdated comments, and verified the applicability of CoCC in different programming languages. We also used CoCC to find outdated comments in the latest commits of open source projects, which further proves the effectiveness of the proposed method.
- Parnas DL. Precise documentation: The key to better software. In: The Future of Software Engineering. Springer. 2011 (pp. 125–148).
- Keyes J. Software engineering handbook. Auerbach Publications . 2002.
- Rani P. Speculative analysis for quality assessment of code comments. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). IEEE. ; 2021: 299–303.
- Addison-Wesley Professional . 2000.
- doi: https://doi.org/10.1016/j.jss.2019.03.010
- Ho TK. Random decision forests. In: . 1 of Proceedings of 3rd international conference on document analysis and recognition. IEEE. ; 1995: 278–282.
- Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. ACM 2016.
- Dreiseitl S, Ohno-Machado L. Logistic regression and artificial neural network classification models: a methodology review. Journal of biomedical informatics 2002; 35(5-6): 352–359.
- Cortes C, Vapnik V. Support-vector networks. Machine learning 1995; 20(3): 273–297.
- Song YY, Ying L. Decision tree methods: applications for classification and prediction. Shanghai archives of psychiatry 2015; 27(2): 130.
- doi: 10.1109/TSE.2007.70731
- Mens T, Tourwé T. A survey of software refactoring. IEEE Transactions on software engineering 2004; 30(2): 126–139.
- McBurney PW, McMillan C. An empirical study of the textual similarity between source code and source code summaries. Empirical Software Engineering 2016; 21(1): 17–42.
- Quinlan JR. Simplifying decision trees. International journal of man-machine studies 1987; 27(3): 221–234.
- Domingos P, Pazzani M. Beyond independence: Conditions for the optimality of the simple bayesian classi er. In: Proc. 13th Intl. Conf. Machine Learning. Citeseer. ; 1996: 105–112.
- Zadrozny B, Elkan C. Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. In: . 1 of Icml. Citeseer. ; 2001: 609–616.
- doi: 10.1109/TSE.2021.3138909
- Arafat O, Riehle D. The commenting practice of open source. ACM 2009.
- Sridhara , Giriprasad . Automatically Detecting the Up-To-Date Status of ToDo Comments in Java Programs. 2016: 16-25.