2000 character limit reached
Rationale Dataset and Analysis for the Commit Messages of the Linux Kernel Out-of-Memory Killer (2403.18832v1)
Published 6 Feb 2024 in cs.SE
Abstract: Code commit messages can contain useful information on why a developer has made a change. However, the presence and structure of rationale in real-world code commit messages is not well studied. Here, we detail the creation of a labelled dataset to analyze the code commit messages of the Linux Kernel Out-Of-Memory Killer component. We study aspects of rationale information, such as presence, temporal evolution, and structure. We find that 98.9% of commits in our dataset contain sentences with rationale information, and that experienced developers report rationale in about 60% of the sentences in their commits. We report on the challenges we faced and provide examples for our labelling.
- Developers’ Need for the Rationale of Code Commits: An in-Breadth and in-Depth Study. Journal of Systems and Software 189 (July 2022), 111320. https://doi.org/10.1016/j.jss.2022.111320
- How Do Developers Discuss Rationale?. In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, Campobasso, 357–369. https://doi.org/10.1109/SANER.2018.8330223
- Can Refactoring Be Self-Affirmed? An Exploratory Study on How Developers Document Their Refactoring Activities in Commit Messages. In 2019 IEEE/ACM 3rd International Workshop on Refactoring (IWoR). IEEE, Montreal, QC, Canada, 51–58. https://doi.org/10.1109/IWoR.2019.00017
- Management of community contributions: A case study on the Android and Linux software ecosystems. Empirical Software Engineering 20 (2015), 252–289.
- Automatic Extraction of Design Decisions from Issue Management Systems: A Machine Learning Based Approach. In Software Architecture, Antónia Lopes and Rogério de Lemos (Eds.). Vol. 10475. Springer International Publishing, Cham, 138–154. https://doi.org/10.1007/978-3-319-65831-5_10
- Daniel P Bovet and Marco Cesati. 2005. Understanding the Linux Kernel: from I/O ports to process management. ” O’Reilly Media, Inc.”.
- What is Rationale and Why Does It Matter? Rationale-Based Software Engineering (2008), 3–23.
- Ralph D’agostino and Egon S Pearson. 1973. Tests for departure from normality. Biometrika 60, 3 (1973), 613–622.
- Mouna Dhaouadi. 2023. Extraction and Management of Rationale. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (Rochester, MI, USA) (ASE ’22). Association for Computing Machinery, New York, NY, USA, Article 122, 3 pages. https://doi.org/10.1145/3551349.3559568
- End-to-End Rationale Reconstruction. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (Rochester, MI, USA) (ASE ’22). Association for Computing Machinery, New York, NY, USA, Article 176, 5 pages.
- Towards Understanding and Analyzing Rationale in Commit Messages using a Knowledge Graph Approach. In 2023 International Conference on Model Driven Engineering Languages and Systems Companion (MODELS-C).
- Joseph L Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychological bulletin 76, 5 (1971), 378.
- Nicolas E Gold and Jens Krinke. 2020. Ethical mining: A case study on MSR mining challenges. In Proceedings of the 17th International Conference on Mining Software Repositories. 265–276.
- Tom-Michael Hesse. 2020. Supporting software development by an integrated documentation model for decisions. Ph. D. Dissertation.
- An evolutionary study of Linux memory management for fun and profit. In 2016 USENIX Annual Technical Conference (USENIX ATC 16). 465–478.
- Continuous Rationale Identification in Issue Tracking and Version Control Systems. Joint Proceedings of REFSQ-2021 Workshops, OpenRE, Posters and Tools Track, and Doctoral Symposium (2021).
- Continuous Rationale Visualization. In Working Conference on Software Visualization (VISSOFT). 33–43. https://doi.org/10.1109/VISSOFT52517.2021.00013
- Jiawei Li and Iftekhar Ahmed. 2023. Commit message matters: Investigating impact and evolution of commit message quality. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 806–817.
- Automatic identification of decisions from the hibernate developer mailing list. In Proceedings of the 24th International Conference on Evaluation and Assessment in Software Engineering.
- Learning the “Whys”: Discovering Design Rationale Using Text Mining — An Algorithm Perspective. Computer-Aided Design 44, 10 (Oct. 2012), 916–930.
- On the relationship between design discussions and design quality: a case study of Apache projects. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 543–555.
- Leann Myers and Maria J Sirois. 2004. Spearman correlation coefficients, differences between. Encyclopedia of statistical sciences 12 (2004).
- The Sense of Logging in the Linux Kernel. Empirical Software Engineering 27, 6 (Nov. 2022), 153. https://doi.org/10.1007/s10664-022-10136-3
- Empirical standards for software engineering research. arXiv preprint arXiv:2010.03525 (2020).
- Recommending refactorings via commit message analysis. Information and Software Technology 126 (2020), 106332.
- Exploring techniques for rationale extraction from existing documents. In 2012 34th International Conference on Software Engineering (ICSE). 1313–1316.
- Extracting Rationale for Open Source Software Development Decisions — A Study of Python Email Archives. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, Madrid, ES, 1008–1019.
- Developing an ontology for architecture knowledge from developer communities. In IEEE International Conference on Software Architecture (ICSA). IEEE, 89–92.
- Diomidis Spinellis and Paris Avgeriou. 2021. Evolution of the Unix System Architecture: An Exploratory Case Study. IEEE Transactions on Software Engineering 47, 6 (June 2021), 1134–1163. https://doi.org/10.1109/TSE.2019.2892149
- Harsh Suri. 2011. Purposeful sampling in qualitative research synthesis. Qualitative research journal 11, 2 (2011), 63–75.
- Assessing the Ability of ChatGPT to Screen Articles for Systematic Reviews. arXiv preprint arXiv:2307.06464 (2023).
- A large-scale empirical study of commit message generation: models, datasets and evaluation. Empirical Software Engineering 27, 7 (2022), 198.
- What makes a good commit message?. In Proceedings of the 44th International Conference on Software Engineering. 2389–2401.
- Do I belong? modeling sense of virtual community among Linux kernel contributors. arXiv:2301.06437 (2023).
- Jan Salvador van der Ven and Jan Bosch. 2013. Making the Right Decision: Supporting Architects with Design Decision Data. In Software Architecture, David Hutchison et al. (Eds.). Vol. 7957. Springer Berlin Heidelberg, Berlin, Heidelberg, 176–183.