Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Analyzing the Accessibility of GitHub Repositories for PyPI and NPM Libraries (2404.17403v1)

Published 26 Apr 2024 in cs.SE

Abstract: Industrial applications heavily rely on open-source software (OSS) libraries, which provide various benefits. But, they can also present a substantial risk if a vulnerability or attack arises and the community fails to promptly address the issue and release a fix due to inactivity. To be able to monitor the activities of such communities, a comprehensive list of repositories for the libraries of an ecosystem must be accessible. Based on these repositories, integrated libraries of an application can be monitored to observe whether they are adequately maintained. In this descriptive study, we analyze the accessibility of GitHub repositories for PyPI and NPM libraries. For all available libraries, we extract assigned repository URLs, direct dependencies and use the page rank algorithm to comprehensively analyze the ecosystems from a library and dependency chain perspective. For invalid repository URLs, we derive potential reasons. Both ecosystems show varying accessibility to GitHub repository URLs, depending on the page rank score of the analyzed libraries. For individual libraries, up to 73.8% of PyPI and up to 69.4% of NPM libraries have repository URLs. Within dependency chains, up to 80.1% of PyPI libraries have URLs, while up to 81.1% for NPM. That means, most libraries, especially the ones of increasing importance, can be monitored on GitHub. Among the most common reasons for invalid repository URLs is no URLs being assigned at all, which amounts up to 17.9% for PyPI and up to 39.6% for NPM. Package maintainers should address this issue and update the repository information to enable monitoring of their libraries.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. On the impact of using trivial packages: An empirical case study on npm and pypi. Empirical Software Engineering 25 (2020), 1168–1204.
  2. Empirical analysis of security vulnerabilities in python packages. Empirical Software Engineering 28, 3 (2023), 59.
  3. A structured approach to assess third-party library usage. In 2012 28th IEEE International Conference on Software Maintenance (ICSM). IEEE, 483–492.
  4. Ethan Bommarito and Michael J Bommarito II. 2019. An Empirical Analysis of the Python Package Index (PyPI). CoRR abs/1907.11073 (2019). arXiv preprint arXiv:1907.11073 (2019).
  5. Towards better dependency management: A first look at dependency smells in python projects. IEEE Transactions on Software Engineering (2022).
  6. Bayesian Pearson correlation analysis. Modern Bayesian statistics in clinical research (2018), 111–118.
  7. Russ Cox. 2019. Surviving software dependencies. Commun. ACM 62, 9 (2019), 36–43.
  8. CISA Open Source Software Security Roadmap. https://www.cisa.gov/resources-tools/resources/cisa-open-source-software-security-roadmap Accessed: January 2, 2024.
  9. On the topology of package dependency networks: A comparison of three programming language ecosystems. In Proccedings of the 10th european conference on software architecture workshops. 1–4.
  10. On the impact of security vulnerabilities in the npm package dependency network. In Proceedings of the 15th international conference on mining software repositories. 181–191.
  11. An empirical comparison of dependency network evolution in seven software packaging ecosystems. Empirical Software Engineering 24 (2019), 381–416.
  12. Johannes Düsing and Ben Hermann. 2022. Analyzing the direct and transitive impact of vulnerabilities onto different artifact repositories. Digital Threats: Research and Practice 3, 4 (2022), 1–25.
  13. Christof Ebert. 2008. Open source software in industry. IEEE Software 25, 3 (2008), 52–53.
  14. Nadia Eghbal. 2020. Working in public: the making and maintenance of open source software. Stripe Press.
  15. An Empirical Study of Malicious Code In PyPI Ecosystem. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 166–177.
  16. Node package manager’s dependency network robustness. arXiv preprint arXiv:2110.11695 (2021).
  17. Dependency update strategies and package characteristics. ACM Transactions on Software Engineering and Methodology 32, 6 (2023), 1–29.
  18. Structure and evolution of package dependency networks. In 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR). IEEE, 102ś112 (2017).
  19. Visualizing the evolution of systems and their library dependencies. In 2014 Second IEEE Working Conference on Software Visualization. IEEE, 127–136.
  20. What are the characteristics of highly-selected packages? A case study on the npm ecosystem. Journal of Systems and Software 198 (2023), 111588.
  21. Toward using package centrality trend to identify packages in decline. IEEE Transactions on Engineering Management (2021).
  22. Fixing dependency errors for Python build reproducibility. In Proceedings of the 30th ACM SIGSOFT international symposium on software testing and analysis. 439–451.
  23. Less is More? An Empirical Study on Configuration Issues in Python PyPI Ecosystem. arXiv preprint arXiv:2310.12598 (2023).
  24. Mike Pittenger. 2016. Open source security analysis: The state of open source security in commercial applications. Black Duck Software, Tech. Rep (2016).
  25. Exploring risks in the usage of third-party libraries. In of the BElgian-NEtherlands software eVOLution seminar. 31.
  26. Kristiina Rahkema and Dietmar Pfahl. 2022. SwiftDependencyChecker: Detecting Vulnerable Dependencies Declared Through CocoaPods, Carthage and Swift PM. In 2022 IEEE/ACM 9th International Conference on Mobile Software Engineering and Systems (MobileSoft). IEEE, 107–111.
  27. Per Runeson and Martin Höst. 2009. Guidelines for conducting and reporting case study research in software engineering. Empirical software engineering 14, 2 (2009), 131–164.
  28. Alexandros Tsakpinis. 2023. Analyzing Maintenance Activities of Software Libraries. In Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering. 313–318.
  29. Alexandros Tsakpinis and Alexander Pretschner. 2024. Analyzing the Accessibility of GitHub Repositories for PyPI and NPM Libraries. (4 2024). https://doi.org/10.6084/m9.figshare.25101428
  30. Ecosystem-level determinants of sustained activity in open-source projects: A case study of the PyPI ecosystem. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 644–655.
  31. Watchman: Monitoring dependency conflicts for python library ecosystem. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 125–135.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com