Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Malicious Package Detection using Metadata Information (2402.07444v1)

Published 12 Feb 2024 in cs.CR

Abstract: Protecting software supply chains from malicious packages is paramount in the evolving landscape of software development. Attacks on the software supply chain involve attackers injecting harmful software into commonly used packages or libraries in a software repository. For instance, JavaScript uses Node Package Manager (NPM), and Python uses Python Package Index (PyPi) as their respective package repositories. In the past, NPM has had vulnerabilities such as the event-stream incident, where a malicious package was introduced into a popular NPM package, potentially impacting a wide range of projects. As the integration of third-party packages becomes increasingly ubiquitous in modern software development, accelerating the creation and deployment of applications, the need for a robust detection mechanism has become critical. On the other hand, due to the sheer volume of new packages being released daily, the task of identifying malicious packages presents a significant challenge. To address this issue, in this paper, we introduce a metadata-based malicious package detection model, MeMPtec. This model extracts a set of features from package metadata information. These extracted features are classified as either easy-to-manipulate (ETM) or difficult-to-manipulate (DTM) features based on monotonicity and restricted control properties. By utilising these metadata features, not only do we improve the effectiveness of detecting malicious packages, but also we demonstrate its resistance to adversarial attacks in comparison with existing state-of-the-art. Our experiments indicate a significant reduction in both false positives (up to 97.56%) and false negatives (up to 91.86%).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Simplifying the search of npm packages. Information and Software Technology 126 (2020), 106365.
  2. Experimental evaluation of a multi-layer feed-forward artificial neural network classifier for network intrusion detection system. In 2017 International Conference on New Trends in Computing Sciences (ICTCS). IEEE, 167–172.
  3. Blake Barnes-Cook and Timothy O’Shea. 2022. Scalable Wireless Anomaly Detection with Generative-LSTMs on RF Post-Detection Metadata. In 2022 IEEE Wireless Communications and Networking Conference (WCNC). IEEE, 483–488.
  4. Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189–1232.
  5. Detecting suspicious package updates. In 2019 IEEE/ACM 41st International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER). IEEE, 13–16.
  6. Anomalicious: Automated detection of anomalous and potentially malicious commits on github. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 258–267.
  7. Samiul Islam and Saman Hassanzadeh Amin. 2020. Prediction of probable backorder scenarios in the supply chain using Distributed Random Forest and Gradient Boosting Machine learning techniques. Journal of Big Data 7, 1 (2020), 1–22.
  8. Attack classification of an intrusion detection system using deep learning and hyperparameter optimization. Journal of Information Security and Applications 58 (2021), 102804.
  9. Tysen Leckie and Alec Yasinsac. 2004. Metadata for anomaly-based security protocol attack deduction. IEEE Transactions on Knowledge and Data Engineering 16, 9 (2004), 1157–1168.
  10. Demystifying the vulnerability propagation and its evolution via dependency trees in the npm ecosystem. In Proceedings of the 44th International Conference on Software Engineering. 672–684.
  11. Marlene Müller. 2012. Generalized linear models. Handbook of Computational Statistics: Concepts and Methods (2012), 681–709.
  12. Anomaly Detection using Network Metadata. International Journal of Advanced Computer Science and Applications 13, 5 (2022).
  13. Explanation of machine learning models using improved shapley additive explanation. In Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. 546–546.
  14. Npm, Inc. 2023. State Of Npm 2023: The Overview. Online. https://blog.sandworm.dev/series/state-of-npm-2023 Accessed on 2023-9-12.
  15. On the feasibility of supervised machine learning for the detection of malicious software packages. In Proceedings of the 17th International Conference on Availability, Reliability and Security. 1–10.
  16. Backstabber’s Knife Collection: A Review of Open Source Software Supply Chain Attacks. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer.
  17. Brian Pfretzschner and Lotfi ben Othmane. 2017. Identification of dependency-based attacks on node. js. In Proceedings of the 12th International Conference on Availability, Reliability and Security. 1–6.
  18. Derek A Pisner and David M Schnyer. 2020. Support vector machine. In Machine learning. Elsevier, 101–121.
  19. On the feasibility of detecting injections in malicious npm packages. In Proceedings of the 17th International Conference on Availability, Reliability and Security. 1–8.
  20. Adriana Sejfia and Max Schäfer. 2022. Practical automated detection of malicious npm packages. In Proceedings of the 44th International Conference on Software Engineering. 1681–1692.
  21. Sonatype. 2019. 2019 State of the Software Supply Chain Report Reveals Best Practices From 36,000 Open Source Software Development Teams. https://www.sonatype.com/press-release-blog/2019-state-of-thesoftware-supply-chain-report-reveals-best-practices-from-36000-opensource-software-development-teams
  22. Coprotector: Protect open-source code against unauthorized training usage with data poisoning. In Proceedings of the ACM Web Conference 2022. 652–660.
  23. Synopsys. 2020. Synopsys 2020 Open Source Security and Risk Analysis Report. https://www.synopsys.com/content/dam/synopsys/sig-assets/reports/2020-ossra-report.pdf
  24. Defending against package typosquatting. In Network and System Security: 14th International Conference, NSS 2020, Melbourne, VIC, Australia, November 25–27, 2020, Proceedings 14. Springer, 112–131.
  25. Laurie Voss. 2018. npm and the future of JavaScript. https://slides.com/seldo/npmfuture-of-javascript.
  26. Duc-Ly Vu. 2021. PY2SRC: Towards the Automatic (and Reliable) Identification of Sources for PyPI Package. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 1394–1396.
  27. Typosquatting and combosquatting attacks on the python ecosystem. In 2020 ieee european symposium on security and privacy workshops (euros&pw). IEEE, 509–514.
  28. HiddenCPG: large-scale vulnerable clone detection using subgraph isomorphism of code property graphs. In Proceedings of the ACM Web Conference 2022. 755–766.
  29. Anomaly detection in seismic data–metadata using simple machine-learning models. Seismological Society of America 92, 4 (2021), 2627–2639.
  30. OpenSSF Scorecard: On the Path Toward Ecosystem-Wide Automated Security Metrics. IEEE Security & Privacy (2023).
  31. What are weak links in the npm supply chain?. In Proceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice. 331–340.
  32. Graph embedding for recommendation against attribute inference attacks. In Proceedings of the Web Conference 2021. 3002–3014.
  33. Multilayer Feedforward Artificial Neural Network. YellowRiver Water Conservancy Press: Zhengzhou, China (1999).
  34. DeepSyslog: Deep Anomaly Detection on Syslog Using Sentence Embedding and Metadata. IEEE Transactions on Information Forensics and Security 17 (2022), 3051–3061.
  35. IFSpard: An information fusion-based framework for spam review detection. In Proceedings of the Web Conference 2021. 507–517.
  36. Small World with High Risks: A Study of Security Threats in the npm Ecosystem.. In USENIX security symposium, Vol. 17.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. S. Halder (128 papers)
  2. M. Bewong (1 paper)
  3. A. Mahboubi (1 paper)
  4. Y. Jiang (551 papers)
  5. R. Islam (17 papers)
  6. Z. Islam (24 papers)
  7. R. Ip (1 paper)
  8. E. Ahmed (3 papers)
  9. G. Ramachandran (4 papers)
  10. A. Babar (1 paper)
Citations (2)

Summary

We haven't generated a summary for this paper yet.