Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DONAPI: Malicious NPM Packages Detector using Behavior Sequence Knowledge Mapping (2403.08334v1)

Published 13 Mar 2024 in cs.CR

Abstract: With the growing popularity of modularity in software development comes the rise of package managers and language ecosystems. Among them, npm stands out as the most extensive package manager, hosting more than 2 million third-party open-source packages that greatly simplify the process of building code. However, this openness also brings security risks, as evidenced by numerous package poisoning incidents. In this paper, we synchronize a local package cache containing more than 3.4 million packages in near real-time to give us access to more package code details. Further, we perform manual inspection and API call sequence analysis on packages collected from public datasets and security reports to build a hierarchical classification framework and behavioral knowledge base covering different sensitive behaviors. In addition, we propose the DONAPI, an automatic malicious npm packages detector that combines static and dynamic analysis. It makes preliminary judgments on the degree of maliciousness of packages by code reconstruction techniques and static analysis, extracts dynamic API call sequences to confirm and identify obfuscated content that static analysis can not handle alone, and finally tags malicious software packages based on the constructed behavior knowledge base. To date, we have identified and manually confirmed 325 malicious samples and discovered 2 unusual API calls and 246 API call sequences that have not appeared in known samples.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (80)
  1. Detection of obfuscated malicious javascript code. Future Internet, 2022.
  2. Birsan Alex. Dependency confusion: How i hacked into apple, microsoft and dozens of other companies. https://medium.com/@alex.birsan/dependency-confusion-4a5d60fec610, 2021.
  3. A multi-perspective malware detection approach through behavioral fusion of api call sequence. Computers & Security, 2021.
  4. Ionut Arghire. Dozens of malicious npm packages steal user, system data. https://www.securityweek.com/dozens-of-malicious-npm-packages-steal-user-system-data/, 2023.
  5. Gershon Aviad. Attacking the software supply chain with a simple rename. https://checkmarx.com/blog/attacking-the-software-supply-chain-with-a-simple-rename/, 2022.
  6. Balaji. New phishing attack hijacks email thread to inject malicious url. https://gbhackers.com/phishing-hijacks-email-thread/, 2023.
  7. Richard Bellairs. What is static analysis? static code analysis overview. https://www.perforce.com/blog/sca/what-static-analysis, 2023.
  8. Introducing package analysis: Scanning open source packages for malicious behavior. https://openssf.org/blog/2022/04/28/introducing-package-analysis-scanning-open-source-packages-for-malicious-behavior/, 2022.
  9. Cruparamer: Learning on parameter-augmented api sequences for malware detection. IEEE Transactions on Information Forensics and Security, 2022.
  10. Intelligent malicious url detection with feature analysis. In ISCC, 2020.
  11. Lags in the release, adoption, and propagation of npm vulnerability fixes. Empirical Software Engineering, 2021.
  12. Catalin Cimpanu. Malicious npm packages caught installing remote access trojans. https://www.zdnet.com/article/malicious-npm-packages-caught-installing-remote-access-trojans/, 2020.
  13. Jimmy Cleveland. How to use environment variables in npm scripts safely across operating systems. https://blog.jimmydc.com/cross-env-for-environment-variables/, 2021.
  14. Tute Costa. strong_password v0.0.7 rubygem hijacked. https://withatwist.dev/strong-password-rubygem-hijacked.html, 2019.
  15. Identification of android malware using refined system calls. Concurrency and computation, 2019.
  16. Towards measuring supply chain attacks on package managers for interpreted languages. In NDSS, 2021.
  17. Jstrong: Malicious javascript detection based on code semantic representation and graph neural network. Computers & Security, 2022.
  18. Detecting malicious javascript code based on semantic analysis. Computers & Security, 2020.
  19. Jstap: A static pre-filter for malicious javascript detection. In ACSAC, 2019.
  20. Jast: Fully syntactic detection of malicious (obfuscated) javascript. In DIMVA, 2018.
  21. Containing malicious package updates in npm with a lightweight permission system. In ICSE, 2021.
  22. Detecting suspicious package updates. In ICSE-NIER, 2019.
  23. Leveraging team dynamics to predict open-source software projects’ susceptibility to social engineering attacks. arXiv, 2021.
  24. Software supply chain: review of attacks, risk assessment strategies and security controls. arXiv, 2023.
  25. Danny Grander. Malicious code found in npm package event-stream downloaded 8 million times in the past 2.5 months. https://snyk.io/blog/malicious-code-found-in-npm-package-event-stream/, 2018.
  26. Investigating package related security threats in software registries. In S&P 2023, 2023.
  27. Guarddog. https://github.com/DataDog/guarddog, 2022.
  28. Hacktricks. Dependency confusion. https://book.hacktricks.xyz/pentesting-web/dependency-confusion, 2023.
  29. Malicious javascript code detection based on hybrid analysis. In APSEC, 2018.
  30. Donapi’s hierarchical classification framework. https://das-lab.github.io/Donapi/, 2024.
  31. The hybrid technique for ddos detection with supervised learning algorithms. Computer Networks, 2019.
  32. The unfortunate reality of insecure libraries. Technical report, Aspect Security, 3 2012.
  33. Jscrambler. https://jscrambler.com/, 2023.
  34. Scaling javascript abstract interpretation to detect and exploit node. js taint-style vulnerability. In S&P, 2023.
  35. Suspicious malicious web site detection with strength analysis of a javascript obfuscation. International Journal of Advanced Science and Technology, 2011.
  36. On the feasibility of cross-language detection of malicious packages in npm and pypi. In ACSAC, 2023.
  37. A novel deep framework for dynamic malware detection based on api sequence intrinsic features. Computers & Security, 2022.
  38. Improving malicious urls detection via feature engineering: Linear and nonlinear space transformation methods. Information Systems, 2020.
  39. Building auto-encoder intrusion detection system based on random forest feature selection. Computers & Security, 2020.
  40. Malicious packages lurking in user-friendly python package index. In TrustCom, 2021.
  41. A needle is an outlier in a haystack: Hunting malicious pypi packages with code clustering. In ASE, 2023.
  42. Obfuscated malicious javascript detection using classification techniques. In International Conference on Malicious and Unwanted Software, 2009.
  43. Demystifying the vulnerability propagation and its evolution via dependency trees in the npm ecosystem. In ICSE, 2022.
  44. A study on malicious software behaviour analysis and detection techniques: Taxonomy, current trends and challenges. Future Generation Computer Systems, 2022.
  45. Statically detecting javascript obfuscation and minification techniques in the wild. In DSN, 2021.
  46. Dotan Nahum. Review of recent npm-based vulnerabilities. https://blog.checkpoint.com/securing-the-cloud/review-of-recent-npm-based-vulnerabilities/, 2023.
  47. Efficient deep learning models for dga domain detection. Security and Communication Networks, 2021.
  48. Beyond typosquatting: An in-depth look at package confusion. In USENIX Security, 2023.
  49. Npm-audit. https://docs.npmjs.com/cli/v10/commands/npm-audit, 2023.
  50. Detecting third-party library problems with combined program analysis. In CCS, 2021.
  51. Supporting the detection of software supply chain attacks through unsupervised signature generation. arXiv, 2020.
  52. Towards detection of software supply chain attacks by forensic artifacts. In ARES, 2020.
  53. Lindsey O’Donnell-Welch. Dozens of malicious data-harvesting npm packages were found. https://duo.com/decipher/dozens-of-malicious-data-harvesting-npm-packages-found, 2022.
  54. A comprehensive survey on identification of malware types and malware classification using machine learning techniques. In ICOSEC, 2021.
  55. QualityClouds. Open source libraries & security vulnerabilities. https://qualityclouds.com/open-source-libraries-and-security-vulnerabilities/, 2021.
  56. Behavioral classification of android applications using system calls. In APSEC, 2021.
  57. An empirical study on the effects of obfuscation on static machine learning-based malicious javascript detectors. In ISSTA, 2023.
  58. ReversingLabs. The state of software supply chain security (sscs) 2024. Technical report, ReversingLabs, 1 2024.
  59. Practical automated detection of malicious npm packages. In ICSE, 2022.
  60. Jonathan Sar Shalom. Common payloads attackers plant in malicious software packages. https://jfrog.com/blog/malware-has-a-way-of-hiding-even-after-the-attack-is-over-get-to-know-these-common-payload-examples/, 2022.
  61. Silent spring: Prototype pollution leads to remote code execution in node.js. In USENIX Security, 2023.
  62. Anything to hide? studying minified and obfuscated code in the web. In WWW, 2019.
  63. Tyler Smith. Prevent npm from installing packages outside of a docker container. https://dev.to/tylerlwsmith/prevent-npm-from-installing-packages-outside-of-a-docker-container-akh, 2021.
  64. Liran Tal. What is typosquatting and how typosquatting attacks are responsible for malicious modules in npm. https://snyk.io/blog/typosquatting-attacks/, 2021.
  65. Liran Tal. Alert: peacenotwar module sabotages npm developers in the node-ipc package to protest the invasion of ukraine. https://snyk.io/blog/peacenotwar-malicious-npm-node-ipc-package-vulnerability/, 2022.
  66. Towards understanding third-party library dependency in c/c++ ecosystem. In ASE, 2022.
  67. Spellbound: Defending against package typosquatting. arXiv, 2020.
  68. Phylum Research Team. Phylum discovers sophisticated ongoing attack on npm. https://blog.phylum.io/sophisticated-ongoing-attack-discovered-on-npm/, 2023.
  69. Detecting obfuscated javascripts from known and unknown obfuscators using machine learning. International Journal on Advances in Security, 2016.
  70. Nikolai Philipp Tschacher. Typosquatting in programming language package managers. PhD thesis, Universität Hamburg, Fachbereich Informatik, 2016.
  71. Preventing dynamic library compromise on node. js via rwx-based privilege reduction. In CCS, 2021.
  72. Veracode. Esg survey report: Modern application development security. Technical report, Veracode, 8 2020.
  73. Taintmini: Detecting flow of sensitive data in mini-programs with static taint analysis. In ICSE, 2023.
  74. An empirical study of usages, updates and risks of third-party libraries in java projects. In ICSME, 2020.
  75. What the fork? finding hidden code clones in npm. In ICSE, 2022.
  76. Wolf at the door: Preventing install-time attacks in npm with latch. In Asia-CCS, 2022.
  77. What are weak links in the npm supply chain? In ICSE-SEIP, 2022.
  78. Automated third-party library detection for android applications: Are we there yet? In ICSE, 2020.
  79. Small world with high risks: A study of security threats in the npm ecosystem. In USENIX Security, 2019.
  80. Nikola Đuza. Javascript growing pains: From 0 to 13,000 dependencies. https://blog.appsignal.com/2020/05/14/javascript-growing-pains-from-0-to-13000-dependencies.html, 2020.
Citations (5)

Summary

We haven't generated a summary for this paper yet.