Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Rationale Dataset and Analysis for the Commit Messages of the Linux Kernel Out-of-Memory Killer (2403.18832v1)

Published 6 Feb 2024 in cs.SE

Abstract: Code commit messages can contain useful information on why a developer has made a change. However, the presence and structure of rationale in real-world code commit messages is not well studied. Here, we detail the creation of a labelled dataset to analyze the code commit messages of the Linux Kernel Out-Of-Memory Killer component. We study aspects of rationale information, such as presence, temporal evolution, and structure. We find that 98.9% of commits in our dataset contain sentences with rationale information, and that experienced developers report rationale in about 60% of the sentences in their commits. We report on the challenges we faced and provide examples for our labelling.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Developers’ Need for the Rationale of Code Commits: An in-Breadth and in-Depth Study. Journal of Systems and Software 189 (July 2022), 111320. https://doi.org/10.1016/j.jss.2022.111320
  2. How Do Developers Discuss Rationale?. In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, Campobasso, 357–369. https://doi.org/10.1109/SANER.2018.8330223
  3. Can Refactoring Be Self-Affirmed? An Exploratory Study on How Developers Document Their Refactoring Activities in Commit Messages. In 2019 IEEE/ACM 3rd International Workshop on Refactoring (IWoR). IEEE, Montreal, QC, Canada, 51–58. https://doi.org/10.1109/IWoR.2019.00017
  4. Management of community contributions: A case study on the Android and Linux software ecosystems. Empirical Software Engineering 20 (2015), 252–289.
  5. Automatic Extraction of Design Decisions from Issue Management Systems: A Machine Learning Based Approach. In Software Architecture, Antónia Lopes and Rogério de Lemos (Eds.). Vol. 10475. Springer International Publishing, Cham, 138–154. https://doi.org/10.1007/978-3-319-65831-5_10
  6. Daniel P Bovet and Marco Cesati. 2005. Understanding the Linux Kernel: from I/O ports to process management. ” O’Reilly Media, Inc.”.
  7. What is Rationale and Why Does It Matter? Rationale-Based Software Engineering (2008), 3–23.
  8. Ralph D’agostino and Egon S Pearson. 1973. Tests for departure from normality. Biometrika 60, 3 (1973), 613–622.
  9. Mouna Dhaouadi. 2023. Extraction and Management of Rationale. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (Rochester, MI, USA) (ASE ’22). Association for Computing Machinery, New York, NY, USA, Article 122, 3 pages. https://doi.org/10.1145/3551349.3559568
  10. End-to-End Rationale Reconstruction. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (Rochester, MI, USA) (ASE ’22). Association for Computing Machinery, New York, NY, USA, Article 176, 5 pages.
  11. Towards Understanding and Analyzing Rationale in Commit Messages using a Knowledge Graph Approach. In 2023 International Conference on Model Driven Engineering Languages and Systems Companion (MODELS-C).
  12. Joseph L Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychological bulletin 76, 5 (1971), 378.
  13. Nicolas E Gold and Jens Krinke. 2020. Ethical mining: A case study on MSR mining challenges. In Proceedings of the 17th International Conference on Mining Software Repositories. 265–276.
  14. Tom-Michael Hesse. 2020. Supporting software development by an integrated documentation model for decisions. Ph. D. Dissertation.
  15. An evolutionary study of Linux memory management for fun and profit. In 2016 USENIX Annual Technical Conference (USENIX ATC 16). 465–478.
  16. Continuous Rationale Identification in Issue Tracking and Version Control Systems. Joint Proceedings of REFSQ-2021 Workshops, OpenRE, Posters and Tools Track, and Doctoral Symposium (2021).
  17. Continuous Rationale Visualization. In Working Conference on Software Visualization (VISSOFT). 33–43. https://doi.org/10.1109/VISSOFT52517.2021.00013
  18. Jiawei Li and Iftekhar Ahmed. 2023. Commit message matters: Investigating impact and evolution of commit message quality. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 806–817.
  19. Automatic identification of decisions from the hibernate developer mailing list. In Proceedings of the 24th International Conference on Evaluation and Assessment in Software Engineering.
  20. Learning the “Whys”: Discovering Design Rationale Using Text Mining — An Algorithm Perspective. Computer-Aided Design 44, 10 (Oct. 2012), 916–930.
  21. On the relationship between design discussions and design quality: a case study of Apache projects. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 543–555.
  22. Leann Myers and Maria J Sirois. 2004. Spearman correlation coefficients, differences between. Encyclopedia of statistical sciences 12 (2004).
  23. The Sense of Logging in the Linux Kernel. Empirical Software Engineering 27, 6 (Nov. 2022), 153. https://doi.org/10.1007/s10664-022-10136-3
  24. Empirical standards for software engineering research. arXiv preprint arXiv:2010.03525 (2020).
  25. Recommending refactorings via commit message analysis. Information and Software Technology 126 (2020), 106332.
  26. Exploring techniques for rationale extraction from existing documents. In 2012 34th International Conference on Software Engineering (ICSE). 1313–1316.
  27. Extracting Rationale for Open Source Software Development Decisions — A Study of Python Email Archives. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, Madrid, ES, 1008–1019.
  28. Developing an ontology for architecture knowledge from developer communities. In IEEE International Conference on Software Architecture (ICSA). IEEE, 89–92.
  29. Diomidis Spinellis and Paris Avgeriou. 2021. Evolution of the Unix System Architecture: An Exploratory Case Study. IEEE Transactions on Software Engineering 47, 6 (June 2021), 1134–1163. https://doi.org/10.1109/TSE.2019.2892149
  30. Harsh Suri. 2011. Purposeful sampling in qualitative research synthesis. Qualitative research journal 11, 2 (2011), 63–75.
  31. Assessing the Ability of ChatGPT to Screen Articles for Systematic Reviews. arXiv preprint arXiv:2307.06464 (2023).
  32. A large-scale empirical study of commit message generation: models, datasets and evaluation. Empirical Software Engineering 27, 7 (2022), 198.
  33. What makes a good commit message?. In Proceedings of the 44th International Conference on Software Engineering. 2389–2401.
  34. Do I belong? modeling sense of virtual community among Linux kernel contributors. arXiv:2301.06437 (2023).
  35. Jan Salvador van der Ven and Jan Bosch. 2013. Making the Right Decision: Supporting Architects with Design Decision Data. In Software Architecture, David Hutchison et al. (Eds.). Vol. 7957. Springer Berlin Heidelberg, Berlin, Heidelberg, 176–183.

Summary

We haven't generated a summary for this paper yet.