Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

How do Software Engineering Researchers Use GitHub? An Empirical Study of Artifacts & Impact (2310.01566v2)

Published 2 Oct 2023 in cs.SE

Abstract: Millions of developers share their code on open-source platforms like GitHub, which offer social coding opportunities such as distributed collaboration and popularity-based ranking. Software engineering researchers have joined in as well, hosting their research artifacts (tools, replication package & datasets) in repositories, an action often marked as part of the publications contribution. Yet a decade after the first such paper-with-GitHub-link, little is known about the fate of such repositories in practice. Do research repositories ever gain the interest of the developer community, or other researchers? If so, how often and why (not)? Does effort invested on GitHub pay off with research impact? In short: we ask whether and how authors engage in social coding related to their research. We conduct a broad empirical investigation of repositories from published work, starting with ten thousand papers in top SE research venues, hand-annotating their 3449 GitHub (and Zenodo) links, and studying 309 paper-related repositories in detail. We find a wide distribution in popularity and impact, some strongly correlated with publication venue. These were often heavily informed by the authors investment in terms of timely responsiveness and upkeep, which was often remarkably subpar by GitHubs standards, if not absent altogether. Yet we also offer hope: popular repositories often go hand-in-hand with well-citepd papers and achieve broad impact. Our findings suggest the need to rethink the research incentives and reward structure around research products requiring such sustained contributions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, , and X. Zheng, “Tensorflow: A system for large-scale machine learning,” in USENIX symposium on operating systems design and implementation, no. 309–346, Nov. 2016.
  2. K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The hadoop distributed file system,” in symposium on mass storage systems and technologies, no. 1-10, May 2010.
  3. C. Timperley, L. Herckis, C. L. Goues, and M. Hilton, “Understanding and improving artifact sharing in software engineering research,” in arXiv preprint arXiv, Aug. 2007.
  4. B. Hermann, S. Winter, and J. Siegmund, “Community expectations for research artifacts and evaluation processes,” in ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Nov. 2020.
  5. M. Shepperd, N. Ajienkab, and S. Counsella, “The role and value of replication in empirical software engineering results,” Information and Software Technology, vol. 9, no. 120-132, Nov. 2018.
  6. R. Kikas, M. Dumas, and D. Pfahl, “Issue dynamics in github projects,” in International Conference on Product-Focused Software Process Improvement, Dec. 2015.
  7. J.-R. Falleri, F. Morandat, X. Blanc, M. Martinez, and M. Monperrus, “Fine-grained and accurate source code differencing,” in international conference on Automated software engineering, no. 313-324, Sep. 2014.
  8. V. Frick, T. Grassauer, F. Beck, and M. Pinzger, “Generating accurate and compact edit scripts using tree differencing,” in International Conference on Software Maintenance and Evolution, no. 264-274, Sep. 2018.
  9. K. Liu, D. Kim, A. Koyuncu, L. Li, T. Bissyand, and Y. L. Traon, “A closer look at real-world patches,” in International Conference on Software Maintenance and Evolution, no. 275-286, Sep. 2018.
  10. A. Koyuncu, K. Liu, T. Bissyand, D. Kim, J. Klein, M. Monperrus, and Y. L. Traon, “Fixminer: Mining relevant fix patterns for automated program repair,” Empirical Software Engineering, no. 1-45, Mar. 2020.
  11. C. Tantithamthavorn and A. Hassan, “An empirical comparison of model validation techniques for defect prediction models,” Transactions on Software Engineering, vol. 1, no. 1-18, Jun. 2016.
  12. F. Palomba, M. Zanoni, F. A. Fontana, A. Lucia, and R. Oliveto, “Toward a smell-aware bugprediction model,” Transactions on Software Engineering, vol. 45, no. 194-218, Jun. 2019.
  13. M. Kondo, C.-P. Bezemer, Y. Kamei, A. E. Hassan, and O. Mizuno, “The impact of feature reduction techniques on defect prediction models,” Empirical Software Engineering, vol. 24, no. 1925-1963, Aug. 2019.
  14. D. Beyer and K. Friedberger, “Domain-independent multi-threaded software model checking,” in International Conference on Automated Software Engineering, no. 634-644, Sep. 2018.
  15. M. Gerrard and M. Dwyer, “Comprehensive failure characterization,” in International Conference on Automated Software Engineering, no. 365-376, Oct. 2017.
  16. SoSy-Lab, “Collection of verification tasks,” https://github.com/sosy-lab/sv-benchmarks, 2017, accessed: 2021-02-15.
  17. F. Chollet, “keras,” https://github.com/fchollet/keras, 2015, accessed: 2021-02-15.
  18. E. A. Santos, J. C. Campbell, D. Patel, A. Hindle, and J. Amaral, “Syntax and sensibility: Using language models to detect and correct syntax errors,” in International Conference on Software Analysis, Evolution and Reengineering, no. 311-322, Mar. 2018.
  19. J. Zhao, A. Albarghouthi, V. Rastogi, S. Jha, and D. Octeau, “Neural-augmented static analysis of android communication,” in ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, no. 342-353, Oct. 2018.
  20. Y. Tian, K. Pei, S. Jana, and B. Ray, “Deeptest: Automated testing of deep-neural-network-driven autonomous cars,” in international conference on software engineering, no. 342-353, May 2018.
  21. X. Du, X. Xie, Y. Li, L. Ma, Y. Liu, and J. Zhao, “Deepstellar: Model-based quantitative analysis of stateful deep learning systems,” in ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, no. 477-487, Aug. 2019.
  22. WALA, “T.j. watson libraries for analysis,” https://github.com/wala/WALA, 2015, accessed: 2021-02-15.
  23. C. Zhi, J. Yin, S. Deng, M. Ye, M. Fu, and T. Xie, “An exploratory study of logging configuration practice in java,” in International Conference on Software Maintenance and Evolution, no. 459-469, Oct. 2019.
  24. S. Wang, J. Nam, and L. Tan, “Qtep: quality-aware test case prioritization,” in Joint Meeting on foundations of software engineering, no. 523-534, Aug. 2017.
  25. Q. Luo, K. Moran, D. Poshyvanyk, and M. D. Penta, “Assessing test case prioritization on real faults and mutants,” in international conference on software maintenance and evolution, no. 240-251, Sep. 2018.
  26. AlDanial, “cloc,” https://github.com/AlDanial/cloc, 2018, accessed: 2021-02-15.
  27. M. A. A. Mamun, C. Berger, and J. Hansson, “Effects of measurements on correlations of software code metrics,” Empirical Software Engineering, vol. 4, no. 2764-2818, Aug. 2019.
  28. M. Gadient, M. Ghafari, P. Frischknecht, and O. Nierstrasz, “Security code smells in android icc,” Empirical Software Engineering, vol. 4, no. 3046-3076, Oct. 2019.
  29. H. Osman, A. Chis, J. Schaerer, M. Ghafari, and O. Nierstrasz, “On the evolution of exception usage in java projects,” in International Conference on Software Analysis, Evolution and Reengineering, no. 422-426, Feb. 2017.
  30. W. Zou, J. Xuan, X. Xie, Z. Chen, and B. Xu, “How does code style inconsistency affect pull request integration? an exploratory study on 117 github projects,” Empirical Software Engineering, vol. 24, no. 3871-3903, Dec. 2019.
  31. Y. Zhang, D. Lo, P. S. Kochhar, X. Xia, Q. Li, and J. Sun, “Detecting similar repositories on github,” in International Conference on Software Analysis, Evolution and Reengineering, no. 13-23, Feb. 2017.
  32. J. Hu, L. Wei, Y. Liu, S.-C. Cheung, and H. Huang, “A tale of two cities: How webview induces bugs to android applications,” in International Conference on Automated Software Engineering, no. 702-713, Feb. 2017.
  33. Y. Liu, J. Wang, L. Wei, C. Xu, S.-C. Cheung, T. Wu, J. Yan, and J. Zhang, “Droidleaks: a comprehensive database of resource leaks in android apps,” Empirical Software Engineering, vol. 4, no. 3435-3483, Dec. 2019.
  34. C. Schreuders, T. Shaw, M. Shan-A-Khuda, G. Ravichandran, and J. Keighley, “Security scenario generator (secgen): A framework for generating randomly vulnerable rich-scenario vms for learning computer security and hosting CTF events,” in USENIX Workshop on Advances in Security Education, no. 422-426, Feb. 2017.
  35. SecGen, “Secgen,” https://github.com/SecGen/SecGen, 2018, accessed: 2021-02-15.
  36. Google, “Guava,” https://github.com/google/guava, 2018, accessed: 2021-02-15.
  37. S. Wattan, B. Chinthanet, H. Hata, R. G. Kula, C. Treude, J. Guo, and K. Matsumoto, “Github repositories with links to academic papers: Open access, traceability, and evolution,” in arXiv preprint arXiv:2004.00199., Apr. 2020.
  38. W. Bangerth and T. Heister, “What makes computational open source software libraries successful?” Computational Science & Discovery, vol. 6, no. 45–80, Apr. 2013.
  39. H. B. Braiek, F. Khomh, and B. Adams, “Open-closed principle of modern machine learning frameworks,” in International Conference on Mining Software Repositories, no. 353–363, May 2019.
  40. R. Milewicz, G. Pinto, and P. Rodeghero, “Characterizing the roles of contributors in open-source scientific software projects,” in International Conference on Mining Software Repositories, no. 421-432, May 2019.
  41. T. STORER, “Bridging the chasm: A survey of software engineering practice in scientific programming,” ACM Computing Surveys (CSUR), vol. 50, no. 47, Aug. 2018.
  42. A. Inokuchi, Y. S. Nugroho, S. Wattanakriengkrai, F. Konishi, H. Hata, C. Treude, A. Monden, and K. Matsumoto, “From academia to software development: Publication citations in source code comments,” in arXiv preprint arXiv:1910.06932, May 2020.
  43. A. Mockus, R. Fielding, and J. Herbsleb, “Two case studies of open source software development: Apache and mozilla,” Transactions on Software Engineering and Methodology, vol. 11, no. 309–346, Jul. 2002.
  44. C. Collberg and T. Proebsting, “Repeatability in computer systems research,” Communications of the ACM, vol. 59, no. 62–69, Feb. 2016.
  45. C. Collberg, T. Proebsting, and A. M. Warren, “Repeatability and benefaction in computer systems research: A study and modest proposal,” in Technical Report TR 14-04, University of Arizona, vol. 59, no. 62–69, Feb. 2015.
  46. P. Donner, “Effect of publication month on citation impact,” Journal of Informetrics, no. 330-343, May 2018.
  47. S. Redner, “How popular is your paper? an empirical study of the citation distribution,” The European Physical Journal B-Condensed Matter and Complex Systems, no. 131-134, Jul. 1998.
  48. L. Bornmann and H.-D. Daniel, “What do citation counts measure? a review of studies on citing behavior,” the Journal of Documentation, no. 45–80, Jan. 2009.
  49. H. Borges, A. Hora, and M. T. Valente, “Understanding the factors that impact the popularity of github repositories,” in International Conference on Software Maintenance and Evolution, no. 334–344, May 2016.
  50. S. Weber and J. Luo, “What makes an open source code popular on github?” in International Conference on Data Mining Workshop (ICDW), no. 851–855, Dec. 2014.
  51. L. Marks, Y. Zou, and A. E. Hassan, “Studying the fix-time for bugs in large open source projects,” in International Conference on Predictive Models in Software Engineering, Sep. 2011.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com