Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

OSS Malicious Package Analysis in the Wild (2404.04991v2)

Published 7 Apr 2024 in cs.CR and cs.SE

Abstract: The open-source software (OSS) ecosystem suffers from various security threats and risks, and malicious packages play a central role in software supply chain (SSC) attacks. Although malware research has a history of over thirty years, less attention has been paid to OSS malware. Its existing research has three limitations: a lack of high-quality datasets, malware diversity, and attack campaign context. In this paper, we first build and curate the largest dataset of 23,425 malicious packages from scattered online sources. We then propose a knowledge graph to represent the OSS malware corpus and conduct malicious package analysis in the wild. Our main findings include (1) it is essential to collect malicious packages from various online sources because there is little data overlap between different sources; (2) despite the sheer volume of SSC attack campaigns, many malicious packages are similar, and unknown/sophisticated attack behaviors have yet to emerge or be detected; (3) OSS malicious package has its distinct life cycle, denoted as {changing->release->detection->removal}, and slightly changing the package (different name) is a widespread attack manner; (4) while malicious packages often lack context about how and who released them, security reports disclose the information about corresponding SSC attack campaigns.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (82)
  1. P. Ladisa, H. Plate, M. Martinez, and O. Barais, “Sok: Taxonomy of attacks on open-source software supply chains,” in 2023 IEEE Symposium on Security and Privacy (SP).   IEEE, 2023, pp. 1509–1526.
  2. J. Koljonen. (2019) Warning! is rest-client 1.6.13 hijacked? https://github.com/rest-client/rest-client/issues/713.
  3. Foundation and other contributors. (2018) Postmortem for malicious packages. https://eslint.org/blog/2018/07/postmortem-for-malicious-package-publishes.
  4. Bertus. (2018) Cryptocurrency clipboard hijacker discovered in pypi repository. https://medium.com@bertusk/cryptocurrency-clipboard-hijacker-discovered-in-pypi-repository-b66b8a534a8.
  5. A. Sharma. (2020) Inside the “Fallguys” Malware That Steals Your Browsing Data and Gaming IMs; Continued Attack on Open Source Software. https://blog.sonatype.com/inside-the-fallguys-malware.
  6. A. Sejfia and M. Schäfer, “Practical automated detection of malicious npm packages,” in Proceedings of the 44th International Conference on Software Engineering, ser. ICSE ’22.   New York, NY, USA: Association for Computing Machinery, 2022, pp. 1681 – 1692. [Online]. Available: https://doi.org/10.1145/3510003.3510104
  7. Y. Zhang, Y. Fan, S. Hou, Y. Ye, X. Xiao, P. Li, C. Shi, L. Zhao, and S. Xu, “Cyber-guided deep neural network for malicious repository detection in github,” in 2020 IEEE International Conference on Knowledge Graph (ICKG).   IEEE, 2020, pp. 458–465.
  8. Y. Qian, Y. Zhang, N. Chawla, Y. Ye, and C. Zhang, “Malicious repositories detection with adversarial heterogeneous graph contrastive learning,” in Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022, pp. 1645–1654.
  9. G. Ferreira, L. Jia, J. Sunshine, and C. Kästner, “Containing malicious package updates in npm with a lightweight permission system,” in 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE).   IEEE, 2021, pp. 1334–1346.
  10. D.-L. Vu, Z. Newman, and J. S. Meyers, “Bad snakes: Understanding and improving python package index malware scanning,” in 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE).   IEEE, 2023, pp. 499–511.
  11. M. Bailey, J. Oberheide, J. Andersen, Z. M. Mao, F. Jahanian, and J. Nazario, “Automated classification and analysis of internet malware,” in Recent Advances in Intrusion Detection: 10th International Symposium, RAID 2007, Gold Goast, Australia, September 5-7, 2007. Proceedings 10.   Springer, 2007, pp. 178–197.
  12. U. Bayer, P. M. Comparetti, C. Hlauschek, C. Kruegel, and E. Kirda, “Scalable, behavior-based malware clustering.” in NDSS, vol. 9, 2009, pp. 8–11.
  13. J. Jang, D. Brumley, and S. Venkataraman, “Bitshred: feature hashing malware for scalable triage and semantic analysis,” in Proceedings of the 18th ACM conference on Computer and communications security, 2011, pp. 309–320.
  14. M. O. F. Rokon, R. Islam, A. Darki, E. E. Papalexakis, and M. Faloutsos, “{{\{{SourceFinder}}\}}: Finding malware {{\{{Source-Code}}\}} from publicly available repositories in {{\{{GitHub}}\}},” in 23rd International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2020), 2020, pp. 149–163.
  15. R. Duan, O. Alrawi, R. P. Kasturi, R. Elder, B. Saltaformaggio, and W. Lee, “Towards measuring supply chain attacks on package managers for interpreted languages,” arXiv preprint arXiv:2002.01139, 2020.
  16. W. Guo, Z. Xu, C. Liu, C. Huang, Y. Fang, and Y. Liu, “An empirical study of malicious code in pypi ecosystem,” in 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE).   IEEE, 2023, pp. 166–177.
  17. M. Ohm, H. Plate, A. Sykosch, and M. Meier, “Backstabber’s knife collection: A review of open source software supply chain attacks,” in International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment.   Springer, 2020, pp. 23–43.
  18. M. Ohm. (2020) Backstabber’s Knife Collection. https://dasfreak.github.io/Backstabbers-Knife-Collection/.
  19. Guarddog. (2023) Malicious Software Packages Dataset. https://github.com/datadog/malicious-software-packages-dataset.
  20. L. Tal. (2019) Malicious remote code execution backdoor discovered in the popular bootstrap-sass Rubygem. https://snyk.io/blog/malicious-remote-code-execution-backdoor-discovered-in-the-popular-bootstrap-sass-ruby-gem/.
  21. S. org. (2023) Snyk Security Database. https://security.snyk.io/vuln.
  22. Q. A. org. (2023) Tianwen: Software Supply Chain. https://tianwen.qianxin.com/home.
  23. GitHub. (2023) Github Security Advisory Database. . https://github.com/advisories.
  24. P. org. (2023) The Software Supply Chain Security Company. https://blog.phylum.io.
  25. S. org. (2023) Secure your supply chain. Ship with confidence. https://socket.dev.
  26. I. Akulov. (2017) Malicious packages in npm. Here’s what to do. https://iamakulov.com/notes/npm-malicious-packages/.
  27. J. Wright. (2017) HUNTING MALICIOUS NPM PACKAGES. https://duo.com/blog/hunting-malicious-npm-packages/.
  28. C. Cimpanu. (2017) Ten Malicious Libraries Found on PyPI - Python Package Index. https://www.bleepingcomputer.com/news/security/ten-malicious-libraries-found-on-pypi-python-package-index.
  29. BeautifulSoup. (2012) a python package for parsing html and xml documents. https://www.crummy.com/software/BeautifulSoup/.
  30. SkylineSportsCentral. (2010) A security worker. =https://twitter.com/sscblog.
  31. Taobao. Taobao npm mirror for users in mainland china. https://registry.npm.taobao.org/.
  32. CNPM. Cnpm (china npm) tailored for chinese users. https://r.cnpmjs.org/.
  33. Aliyun. Aliyun npm mirror by alibaba cloud. https://npm.aliyun.com/.
  34. USTC. Ustc npm mirror for users in china. https://mirrors.ustc.edu.cn/npm/.
  35. Huawei. Huawei cloud npm mirror. https://mirrors.huaweicloud.com/repository/npm/.
  36. TUNA. Tuna pypi mirror for users in china. https://pypi.tuna.tsinghua.edu.cn/.
  37. Alibaba. Alibaba cloud pypi mirror for expedited downloads. https://mirrors.aliyun.com/pypi/.
  38. Douban. Douban pypi mirror for faster installations. https://pypi.doubanio.com/.
  39. USTC. Ustc pypi mirror for users in china. https://pypi.mirrors.ustc.edu.cn/.
  40. Tencent. Tencent cloud pypi mirror for faster downloads. https://mirrors.cloud.tencent.com/pypi/.
  41. Huawei. Huawei cloud pypi mirror for faster installations. https://mirrors.huaweicloud.com/repository/pypi/.
  42. BFSU. Bfsu pypi mirror for downloads and installations. https://mirrors.bfsu.edu.cn/pypi/web/.
  43. NetEase. Netease pypi mirror for faster downloads. https://mirrors.163.com/pypi/.
  44. SUSTech. Sustech pypi mirror for accelerated downloads. https://mirrors.sustech.edu.cn/pypi/.
  45. RStudio. Rstudio pypi mirror for accelerated installations. https://packagemanager.rstudio.com/pypi/.
  46. Universitas. Universitas padjadjaran pypi mirror. https://file.unpad.ac.id/pypi/.
  47. Kakao. Kakao pypi mirror for accelerated downloads. http://mirror.kakao.com/pypi/.
  48. Taobao. Taobao rubygems mirror for accelerated installations. http://ruby.taobao.org.
  49. TUNA. Tuna rubygems mirror aiming to accelerate installations in china. https://mirrors.tuna.tsinghua.edu.cn/rubygems/.
  50. HUST. Hust rubygems mirror for downloads and installations. https://mirrors.hust.edu.cn/rubygems/.
  51. Alibaba. Alibaba cloud rubygems mirror for expedited downloads. https://mirrors.aliyun.com/rubygems/.
  52. SYSU. Sysu rubygems mirror for accelerated downloads. http://mirror.sysu.edu.cn/rubygems/.
  53. SDUTLinux. Sdutlinux rubygems mirror. http://ruby.sdutlinux.org/.
  54. C. Rossow, C. J. Dietrich, C. Grier, C. Kreibich, V. Paxson, N. Pohlmann, H. Bos, and M. Van Steen, “Prudent practices for designing malware experiments: Status quo and outlook,” in 2012 IEEE symposium on security and privacy.   IEEE, 2012, pp. 65–79.
  55. Osssanitizer. (2020) Maloss-sample. https://github.com/osssanitizer/maloss-samples/.
  56. B. Toulas. (2023) Malicious ‘Lolip0p’ PyPi packages install info-stealing malware. https://www.bleepingcomputer.com/news/security/malicious-lolip0p-pypi-packages-install-info-stealing-malware/#google_vignette.
  57. neo4j. (2010) a high-performance NOSQL graphical database that stores structured data on the network rather than in tables. =https://neo4j.com/.
  58. hashlib. (2006) a Python library that provides multiple hash algorithms for encrypting hash and verifying data. =https://docs.python.org/2/library/hashlib.html.
  59. O. Inc. (2022) a tool to help to mitigate software supply chain attacks. https://github.com/ossillate-inc/packj.
  60. Zyte. (2021) Scrapy, An open source and collaborative framework for extracting the data you need from websites. https://scrapy.org.
  61. Scikit-learn. (2007) Machine learning library for the python language. http://scikit-learn.org/stable/index.html.
  62. Phylum. (2023) Sophisticated, Highly-Targeted Attacks Continue to Plague npm. https://blog.phylum.io/sophisticated-highly-targeted-attacks-continue-to-plague-npm/.
  63. ——. (2023) Phylum Discovers Aggressive Attack on PyPI Attempting to Deliver Rust Executable. https://blog.phylum.io/phylum-discovers-another-attack-on-pypi/.
  64. J. Cappos, J. Samuel, S. Baker, and J. H. Hartman, “A look in the mirror: Attacks on package managers,” in Proceedings of the 15th ACM conference on Computer and communications security, 2008, pp. 565–574.
  65. E. Wyss, A. Wittman, D. Davidson, and L. De Carli, “Wolf at the door: Preventing install-time attacks in npm with latch,” in Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security, ser. ASIA CCS ’22.   New York, NY, USA: Association for Computing Machinery, 2022, pp. 1139 – 1153. [Online]. Available: https://doi.org/10.1145/3488932.3523262
  66. P. Ladisa, H. Plate, M. Martinez, O. Barais, and S. E. Ponta, “Towards the detection of malicious java packages,” in Proceedings of the 2022 ACM Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses, ser. SCORED’22.   New York, NY, USA: Association for Computing Machinery, 2022, pp. 63 – 72. [Online]. Available: https://doi.org/10.1145/3560835.3564548
  67. A. Sejfia and M. Schäfer, “Practical automated detection of malicious npm packages,” in Proceedings of the 44th International Conference on Software Engineering, 2022, pp. 1681–1692.
  68. M. Ohm, H. Plate, A. Sykosch, and M. Meier, “Backstabber’s knife collection: A review of open source software supply chain attacks,” in Detection of Intrusions and Malware, and Vulnerability Assessment, C. Maurice, L. Bilge, G. Stringhini, and N. Neves, Eds.   Cham: Springer International Publishing, 2020, pp. 23–43.
  69. B. Pfretzschner and L. ben Othmane, “Identification of dependency-based attacks on node. js,” in Proceedings of the 12th International Conference on Availability, Reliability and Security, 2017, pp. 1–6.
  70. C.-A. Staicu, M. Pradel, and B. Livshits, “Understanding and automatically preventing injection attacks on node. js,” in Network and Distributed System Security Symposium (NDSS), 2018.
  71. A. Cao and B. Dolan-Gavitt, “What the fork? finding and analyzing malware in github forks,” in Proceedings of the NDSS, vol. 22, 2022.
  72. A. Decan, T. Mens, and M. Claes, “An empirical comparison of dependency issues in oss packaging ecosystems,” in 2017 IEEE 24th international conference on software analysis, evolution and reengineering (SANER).   IEEE, 2017, pp. 2–12.
  73. M. Zimmermann, C.-A. Staicu, C. Tenny, and M. Pradel, “Small world with high risks: A study of security threats in the npm ecosystem,” in 28th USENIX Security Symposium (USENIX Security 19), 2019, pp. 995–1010.
  74. A. Dann, H. Plate, B. Hermann, S. E. Ponta, and E. Bodden, “Identifying challenges for oss vulnerability scanners-a study & test suite,” IEEE Transactions on Software Engineering, vol. 48, no. 9, pp. 3613–3625, 2021.
  75. N. Vasilakis, A. Benetopoulos, S. Handa, A. Schoen, J. Shen, and M. C. Rinard, “Supply-chain vulnerability elimination via active learning and regeneration,” in Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, 2021, pp. 1755–1770.
  76. D.-L. Vu, F. Massacci, I. Pashchenko, H. Plate, and A. Sabetta, “Lastpymile: identifying the discrepancy between sources and packages,” in Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2021, pp. 780–792.
  77. M. Taylor, R. K. Vaidya, D. Davidson, L. De Carli, and V. Rastogi, “Spellbound: Defending against package typosquatting,” arXiv preprint arXiv:2003.03471, 2020.
  78. F. Xiao, J. Huang, Y. Xiong, G. Yang, H. Hu, G. Gu, and W. Lee, “Abusing hidden properties to attack the node. js ecosystem,” in 30th USENIX Security Symposium (USENIX Security 21), 2021, pp. 2951–2968.
  79. I. Pashchenko, D.-L. Vu, and F. Massacci, “A qualitative study of dependency management and its security implications,” in Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, 2020, pp. 1513–1531.
  80. N. Zahan, L. Williams, T. Zimmermann, P. Godefroid, B. Murphy, and C. Maddila, “What are weak links in the npm supply chain?” arXiv preprint arXiv:2112.10165, 2021.
  81. Y. Gu, L. Ying, Y. Pu, X. Hu, H. Chai, R. Wang, X. Gao, and H. Duan, “Investigating package related security threats in software registries,” in 2023 IEEE Symposium on Security and Privacy (SP).   IEEE, 2023, pp. 1578–1595.
  82. E. Wyss, A. Wittman, D. Davidson, and L. De Carli, “Wolf at the door: Preventing install-time attacks in npm with latch,” in Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security, 2022, pp. 1139–1153.
Citations (1)

Summary

We haven't generated a summary for this paper yet.