Characterizing Dependency Update Practice of NPM, PyPI and Cargo Packages (2403.17382v1)
Abstract: Keeping dependencies up-to-date prevents software supply chain attacks through outdated and vulnerable dependencies. Developers may use packages' dependency update practice as one of the selection criteria for choosing a package as a dependency. However, the lack of metrics characterizing packages' dependency update practice makes this assessment difficult. To measure the up-to-date characteristics of packages, we focus on the dependency management aspect and propose two update metrics: Time-Out-Of-Date (TOOD) and Post-Fix-Exposure-Time (PFET), to measure the updatedness of dependencies and updatedness of vulnerable dependencies, respectively. We design an algorithm to stabilize the dependency relationships in different time intervals and compute the proposed metrics for each package. Using our proposed metrics, we conduct a large-scale empirical study of update metrics with 2.9M packages, 66.8M package versions, and 26.8M unique package-dependency relations in NPM, PyPI, and Cargo, ranging from the year 2004 to 2023. We analyze the characteristics of the proposed metrics for capturing packages' dependency update practice in the three ecosystems. Given that the TOOD metric generates a greater volume of data than the PFET metric, we further explore the numerical relationship between these metrics to assess their potential as substitutes for vulnerability counts metrics. We find that PyPI packages update dependencies faster than NPM and Cargo. Conversely, Cargo packages update their vulnerable dependencies faster than NPM and PyPI. We also find that the general purpose update metric, TOOD, can be a proxy for the security-focused update metric, PFET.
- Executive Order on Improving the Nation’s Cybersecurity. https://www.whitehouse.gov/briefing-room/presidential-actions/2021/05/12/executive-order-on-improving-the-nations-cybersecurity/, 2021-05-12. Last accessed: 18-Mar-2024.
- Open Source Vulnerability Notification. In Francis Bordeleau, Alberto Sillitti, Paulo Meirelles, and Valentina Lenarduzzi, editors, Open Source Systems, volume 556, pages 12–23. Springer International Publishing, Cham, 2019.
- CVE-2017-5638: The Apache Struts vulnerability explained. https://www.synopsys.com/blogs/software-security/cve-2017-5638-apache-struts-vulnerability-explained.html, 2017. Last accessed: 18-Mar-2024.
- State of the Software Supply Chain. https://www.sonatype.com/state-of-the-software-supply-chain/open-source-supply-and-demand, 2023. Last accessed: 18-Mar-2024.
- ”Always Contribute Back”: A Qualitative Study on Security Challenges of the Open Source Supply Chain. In 2023 IEEE Symposium on Security and Privacy (SP), pages 1545–1560, May 2023.
- OSSF Scorecard: Build better security habits, one test at a time. https://securityscorecards.dev/. Last accessed: 18-Mar-2024.
- Technical Lag in Software Compilations: Measuring How Outdated a Software Deployment Is. In Federico Balaguer, Roberto Di Cosmo, Alejandra Garrido, Fabio Kon, Gregorio Robles, and Stefano Zacchiroli, editors, Open Source Systems: Towards Robust Practices, IFIP Advances in Information and Communication Technology, pages 182–192, Cham, 2017. Springer International Publishing.
- Measuring, analyzing and predicting security vulnerabilities in software systems. computers & security, 26(3):219–228, 2007.
- SPAIN: Security Patch Analysis for Binaries towards Understanding the Pain and Pills. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), pages 462–472, May 2017. ISSN: 1558-1225.
- Detecting “0-Day” Vulnerability: An Empirical Study of Secret Security Patch in OSS. In 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pages 485–492, June 2019. ISSN: 1530-0889.
- Hermes: and Using Commit-Issue Linking to Detect Vulnerability-Fixing Commits. In 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), March 2022.
- A Practical Approach to the Automatic Classification of Security-Relevant Commits. In 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 579–582, September 2018. ISSN: 2576-3148.
- BACK TO THE BUILDING BLOCKS: A PATH TOWARD SECURE AND MEASURABLE SOFTWARE. https://www.whitehouse.gov/wp-content/uploads/2024/02/Final-ONCD-Technical-Report.pdf, 2024. Last accessed: 18-Mar-2024.
- Can traditional fault prediction models be used for vulnerability prediction? Empirical Software Engineering, 18(1):25–59, February 2013.
- A Large Scale Analysis of Semantic Versioning in NPM. In Proceedings of the 20th International Conference on Mining Software Repositories, 2023.
- Mapping the field of software life cycle security metrics. Information and Software Technology, 102:146–159, October 2018.
- State of the Software Supply Chain. https://www.sonatype.com/hubfs/SSC/2019%20SSC/SON_SSSC-Report-2019_jun16-DRAFT.pdf, 2019. Last accessed: 18-Mar-2024.
- An Empirical Study of Dependency Downgrades in the npm Ecosystem. IEEE Transactions on Software Engineering, 47(11):2457–2470, November 2021.
- An Empirical Analysis of Technical Lag in npm Package Dependencies. In ICSR, April 2018.
- On the Evolution of Technical Lag in the npm Package Dependency Network. In 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 404–414, September 2018.
- A formal framework for measuring technical lag in component repositories — and its application to npm. Journal of Software: Evolution and Process, 31(8):e2157, 2019.
- A multi-dimensional analysis of technical lag in Debian-based Docker images. Empirical Software Engineering, 26(2):19, February 2021.
- Technical Lag of Dependencies in Major Package Managers. In 2020 27th Asia-Pacific Software Engineering Conference (APSEC), pages 228–237, December 2020.
- Lags in the release, adoption, and propagation of npm vulnerability fixes. Empirical Software Engineering, 26(3):47, March 2021.
- An Empirical Study of API Stability and Adoption in the Android Ecosystem. In 2013 IEEE International Conference on Software Maintenance, pages 70–79, September 2013.
- Trusting a library: A study of the latency to adopt the latest Maven release. In 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), pages 520–524, March 2015.
- Measuring Dependency Freshness in Software Systems. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, volume 2, pages 109–118, May 2015.
- Keep me Updated: An Empirical Study of Third-Party Library Updatability on Android. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pages 2187–2200, Dallas Texas USA, October 2017. ACM.
- An Empirical Study of Usages, Updates and Risks of Third-Party Libraries in Java Projects. In 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 35–45, September 2020.
- Characterizing usages, updates and risks of third-party libraries in Java projects. Empirical Software Engineering, 27(4):90, April 2022.
- Vulnerable open source dependencies: Counting those that matter. In Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, pages 1–10, Oulu Finland, October 2018. ACM.
- Do developers update their library dependencies? Empirical Software Engineering, 23(1):384–417, February 2018.
- Small World with High Risks: A Study of Security Threats in the npm Ecosystem. In 28th USENIX Security Symposium (USENIX Security 19), page 17. USENIX Association, 2019.
- A time/structure based software reliability model. Annals of Software Engineering, 8(1):85–121, February 1999.
- Applying the Goal, Question, Metric method to derive tailored dynamic cyber risk metrics. Information & Computer Security, ahead-of-print(ahead-of-print), January 2023.
- Survey of software tools for evaluating reliability, availability, and serviceability. ACM Computing Surveys, 20(4):227–269, December 1988.
- Mary Ann Davidson. The Good, the Bad, And the Ugly: Stepping on the Security Scale. In 2009 Annual Computer Security Applications Conference, pages 187–195, December 2009.
- Using security metrics coupled with predictive modeling and simulation to assess security processes. In 2009 3rd International Symposium on Empirical Software Engineering and Measurement, pages 564–573, October 2009.
- MTTR: The Most Important Security Metric. https://www.darkreading.com/cyberattacks-data-breaches/mttr-most-important-security-metric, 2024. Last accessed: 18-Mar-2024.
- https://plextrac.com/mttd-and-mttr-in-cybersecurity/. Last accessed: 18-Mar-2024.
- An Empirical Study of the Component Dependency Resolution Search Space. In Lars Grunske, Ralf Reussner, and Frantisek Plasil, editors, Component-Based Software Engineering, Lecture Notes in Computer Science, pages 182–199, Berlin, Heidelberg, 2010. Springer.
- osv.dev : A distributed vulnerability database for open source. https://osv.dev. Last accessed: 18-Mar-2024.
- A Historical Analysis of Debian Package Incompatibilities. In 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, pages 212–223, Florence, Italy, May 2015. IEEE.
- Ward Cunningham. The WyCash portfolio management system. ACM SIGPLAN OOPS Messenger, 4(2):29–30, April 1993.
- Technical Debt: From Metaphor to Theory and Practice. IEEE Software, 29(6):18–21, November 2012.
- Measure it? Manage it? Ignore it? software practitioners and technical debt. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, pages 50–60, Bergamo Italy, August 2015. ACM.
- An exploration of technical debt. Journal of Systems and Software, 86(6):1498–1516, June 2013.
- A systematic mapping study on technical debt and its management. Journal of Systems and Software, 101:193–220, March 2015.
- Joshua Kerievsky. Refactoring to patterns. Addison-Wesley, 2007.
- What Do Package Dependencies Tell Us About Semantic Versioning? IEEE Transactions on Software Engineering, 47(6):1226–1240, June 2021.
- Semantic Versioning versus Breaking Changes: A Study of the Maven Repository. In 2014 IEEE 14th International Working Conference on Source Code Analysis and Manipulation, pages 215–224, September 2014.
- Dependency Versioning in the Wild. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), pages 349–359, May 2019.
- Breaking bad? Semantic versioning and impact of breaking changes in Maven Central. Empirical Software Engineering, 27(3):61, March 2022.
- Open Source Insights API (BigQuery). https://docs.deps.dev/bigquery/v1/. Last accessed: 18-Mar-2024.
- Open Source Insights: Understand your dependencies. https://deps.dev/. Last accessed: 18-Mar-2024.
- crates.io: Rust Package Registry. https://crates.io. Last accessed: 18-Mar-2024.
- Node.js Package Manager. https://npmjs.com. Last accessed: 18-Mar-2024.
- The Python Package Index. https://pypi.org. Last accessed: 18-Mar-2024.
- Why do developers use trivial packages? an empirical case study on npm. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, pages 385–395, New York, NY, USA, August 2017. Association for Computing Machinery.
- On the impact of using trivial packages: An empirical case study on npm and PyPI. Empirical Software Engineering, 25(2):1168–1204, March 2020.
- Violin Plots: A Box Plot-Density Trace Synergism. The American Statistician, 52(2):181–184, May 1998.
- M. B. WILK and R. GNANADESIKAN. Probability plotting methods for the analysis for the analysis of data. Biometrika, 55(1):1–17, March 1968.
- J. L. Hodges. The significance probability of the smirnov two-sample test. Arkiv för Matematik, 3(5):469–486, January 1958.
- Goodness-of-Fit Tests Based on P - P Probability Plots. Technometrics, 32(3):289–303, August 1990.
- Power-Law Distributions in Empirical Data. SIAM Review, 51(4):661–703, November 2009.
- An omnibus test for the two-sample problem using the empirical characteristic function. Journal of Statistical Computation and Simulation, 26(3-4):177–203, December 1986.
- Nonparametric Testing of Distributions—the Epps–Singleton Two-Sample Test using the Empirical Characteristic Function. The Stata Journal, 9(3):454–465, September 2009.
- K-Sample Anderson–Darling Tests. Journal of the American Statistical Association, 82(399):918–924, September 1987.
- On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. The Annals of Mathematical Statistics, 18(1):50–60, 1947.
- A. Di Bucchianico. Combinatorics, computer algebra and the Wilcoxon-Mann-Whitney test. Journal of Statistical Planning and Inference, 79(2):349–364, July 1999.
- Wilcoxon-Mann-Whitney or t-test? On assumptions for hypothesis tests and multiple interpretations of decision rules. Statistics surveys, 4:1–39, 2010.
- T. W. Anderson. On the Distribution of the Two-Sample Cramér-von Mises Criterion. The Annals of Mathematical Statistics, 33(3):1148–1159, 1962.
- William Jay Conover. Practical nonparametric statistics, volume 350. john wiley & sons, 1999.
- SciPy v1.12.0 Manual. https://docs.scipy.org/doc/scipy/index.html. Last accessed: 18-Mar-2024.
- Bruce Ratner. The correlation coefficient: Its values range between+ 1/- 1, or do they? Journal of targeting, measurement and analysis for marketing, 17(2):139–142, 2009.
- Fitter. https://fitter.readthedocs.io/en/latest/. Last accessed: 18-Mar-2024.
- Xavier Javines Bilon. Normality and significance testing in simple linear regression model for large sample sizes: A simulation study. Communications in Statistics - Simulation and Computation, 52(6):2781–2797, June 2023.
- Research Commentary—Too Big to Fail: Large Samples and the p-Value Problem. Information Systems Research, 24(4):906–917, December 2013.
- Björn Lantz. The large sample size fallacy. Scandinavian Journal of Caring Sciences, 27(2):487–492, 2013.
- “We Feel Like We’re Winging It:” A Study on Navigating Open-Source Dependency Abandonment. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 1281–1293, San Francisco CA USA, November 2023. ACM.
- Deprecation of Packages and Releases in Software Ecosystems: A Case Study on NPM. IEEE Transactions on Software Engineering, 48(7):2208–2223, July 2022.
- The Cargo Book. https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html, 2024. Last accessed: 18-Mar-2024.
- CISA. Vulnerability Exploitability eXchange (VEX) : Use Cases. https://www.cisa.gov/sites/default/files/2023-01/VEX_Use_Cases_Aprill2022.pdf. Last accessed: 18-Mar-2024.