Unveiling Hidden Links Between Unseen Security Entities (2403.02014v1)
Abstract: The proliferation of software vulnerabilities poses a significant challenge for security databases and analysts tasked with their timely identification, classification, and remediation. With the National Vulnerability Database (NVD) reporting an ever-increasing number of vulnerabilities, the traditional manual analysis becomes untenably time-consuming and prone to errors. This paper introduces VulnScopper, an innovative approach that utilizes multi-modal representation learning, combining Knowledge Graphs (KG) and NLP, to automate and enhance the analysis of software vulnerabilities. Leveraging ULTRA, a knowledge graph foundation model, combined with a LLM, VulnScopper effectively handles unseen entities, overcoming the limitations of previous KG approaches. We evaluate VulnScopper on two major security datasets, the NVD and the Red Hat CVE database. Our method significantly improves the link prediction accuracy between Common Vulnerabilities and Exposures (CVEs), Common Weakness Enumeration (CWEs), and Common Platform Enumerations (CPEs). Our results show that VulnScopper outperforms existing methods, achieving up to 78% Hits@10 accuracy in linking CVEs to CPEs and CWEs and presenting an 11.7% improvement over LLMs in predicting CWE labels based on the Red Hat database. Based on the NVD, only 6.37% of the linked CPEs are being published during the first 30 days; many of them are related to critical and high-risk vulnerabilities which, according to multiple compliance frameworks (such as CISA and PCI), should be remediated within 15-30 days. Our model can uncover new products linked to vulnerabilities, reducing remediation time and improving vulnerability management. We analyzed several CVEs from 2023 to showcase this ability.
- Critical vulnerabilities in media libraries exploited in the wild: everything you need to know. https://www.wiz.io/blog/cve-2023-4863-and-cve-2023-5217-exploited-in-the-wild. Accessed: 2024-02-05.
- The cve-2023-5217 deja vu – another actively exploited chrome vulnerability affecting a webm project library (libvpx). https://www.rezilion.com/blog/the-cve-2023-5217-deja-vu-another-actively-exploited-chrome-vulnerability-affecting-a-webm-project-library-libvpx/. Accessed: 2024-02-05.
- Cwe list version 4.10. https://cwe.mitre.org/data/downloads.html. Accessed: 2024-02-05.
- Github - brave-browser upgrade from chromium 116.0.5845.180 to chromium 116.0.5845.188. https://github.com/brave/brave-browser/issues/32911. Accessed: 2024-02-05.
- Microsoft edge security release notes. https://learn.microsoft.com/en-us/deployedge/microsoft-edge-relnotes-security#september-15-2023. Accessed: 2024-02-05.
- New release: Tor browser 12.5.4. https://blog.torproject.org/new-release-tor-browser-1254/. Accessed: 2024-02-05.
- Nvd - developers - vulnerabilities api. https://nvd.nist.gov/developers/vulnerabilities. Accessed: 2024-02-05.
- Nvd: Cve-2023-38545 detail. https://nvd.nist.gov/vuln/detail/CVE-2023-38545. Accessed: 2024-02-05.
- Qt blog security advisory notes cve-2023-4863. https://www.qt.io/blog/two-qt-security-advisorys-gdi-font-engine-webp-image-format. Accessed: 2024-02-05.
- Qt blog security advisory notes cve-2023-5217. https://bugreports.qt.io/browse/QTBUG-117985. Accessed: 2024-02-05.
- Red hat: Cve-2023-38545 detail. https://access.redhat.com/security/cve/cve-2023-38545. Accessed: 2024-02-05.
- Rezilion researchers uncover new details on severity of google chrome zero-day vulnerability (cve-2023-4863). https://www.rezilion.com/blog/rezilion-researchers-uncover-new-details-on-severity-of-google-chrome-zero-day-vulnerability-cve-2023-4863/. Accessed: 2024-02-05.
- Rhsa-2023:5309 - security advisory. https://access.redhat.com/errata/RHSA-2023:5309. Accessed: 2024-02-05.
- Rhsa-2023:5309 - security advisory. https://github.com/advisories/GHSA-qqvq-6xgj-jw8g. Accessed: 2024-02-05.
- Rhsa-2023:5309 - security advisory. https://security.snyk.io/vuln/SNYK-DOTNET-CEFSHARPCOMMON-5936337. Accessed: 2024-02-05.
- Security data api - red hat customer portal. https://access.redhat.com/labsinfo/securitydataapi. Accessed: 2024-02-05.
- Snyk webp 0-day cve-2023-4863. https://snyk.io/blog/critical-webp-0-day-cve-2023-4863/. Accessed: 2024-02-05.
- Ubuntu security notes. https://ubuntu.com/security/CVE-2023-4863. Accessed: 2024-02-05.
- Translating embeddings for modeling multi-relational data. In Advances in Neural Information Processing Systems (2013), C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger, Eds., vol. 26, Curran Associates, Inc.
- A survey on evaluation of large language models, 2023.
- Evaluation of chatgpt model for vulnerability detection, 2023.
- V2W-BERT: A framework for effective hierarchical multiclass classification of software vulnerabilities. CoRR abs/2102.11498 (2021).
- Nodepiece: Compositional and parameter-efficient representations of large knowledge graphs. In International Conference on Learning Representations (2022).
- Towards foundation models for knowledge graph reasoning.
- node2vec: Scalable feature learning for networks. CoRR abs/1607.00653 (2016).
- Predicting missing information of key aspects in vulnerability reports. CoRR abs/2008.02456 (2020).
- Deepweak: Reasoning common software weaknesses via knowledge graph embedding. In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER) (2018), pp. 456–466.
- Jogi, B. CVE-2021-44228: Apache Log4j2 Zero-Day Exploited in the Wild (Log4Shell), 2021.
- Kaur, G. Usage of regular expressions in nlp. International Journal of Research in Engineering and Technology IJERT 3, 01 (2014), 7.
- Efficient estimation of word representations in vector space, 2013.
- MITRE. Common weakness enumeration. https://cwe.mitre.org/.
- National Institute of Standards and Technology. NVD - CVE-2021-44228. https://nvd.nist.gov/vuln/detail/CVE-2021-44228, 2021. Accessed: 2024-02-04.
- National Institute of Standards and Technology (NIST). National vulnerability database (nvd). https://nvd.nist.gov/, 2023.
- National Vulnerability Database. CVE-2023-4863 Detail. https://nvd.nist.gov/vuln/detail/CVE-2023-4863, 2023.
- OpenAI. New and improved embedding model. https://openai.com/blog/new-and-improved-embedding-model, 2022.
- OpenAI. Chatgpt-4: Advancements in language understanding and generation. https://openai.com, 2023.
- OpenAI. Openai embedding models. https://platform.openai.com/docs/models/embeddings, 2024.
- The pagerank citation ranking : Bringing order to the web. In The Web Conference (1999).
- Red Hat, Inc. Cve-2023-4863. https://access.redhat.com/security/cve/cve-2023-4863, 2023. Accessed: 2024-02-04.
- Red Hat, Inc. Red hat cve database. https://access.redhat.com/security/security-updates/#/cve, 2024.
- Uncovering cwe-cve-cpe relations with threat knowledge graphs. ACM Transactions on Privacy and Security (Jan. 2024).
- Rotate: Knowledge graph embedding by relational rotation in complex space. In International Conference on Learning Representations (2019).
- Composition-based multi-relational graph convolutional networks. CoRR abs/1911.03082 (2019).
- Attention is all you need. CoRR abs/1706.03762 (2017).
- Graph attention networks, 2018.
- The analysis method of security vulnerability based on the knowledge graph. In Proceedings of the 2020 10th International Conference on Communication and Network Security (New York, NY, USA, 2021), ICCNS ’20, Association for Computing Machinery, p. 135–145.
- Automated cpe labeling of cve summaries with machine learning. In Detection of Intrusions and Malware, and Vulnerability Assessment (Cham, 2020), C. Maurice, L. Bilge, G. Stringhini, and N. Neves, Eds., Springer International Publishing, pp. 3–22.
- Embedding and predicting software security entity relationships: A knowledge graph based approach. In Neural Information Processing (Cham, 2019), T. Gedeon, K. W. Wong, and M. Lee, Eds., Springer International Publishing, pp. 50–63.
- Embedding entities and relations for learning and inference in knowledge bases, 2015.
- Predicting entity relations across different security databases by using graph attention network. In 2021 IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC) (2021), pp. 834–843.
- A comparative study of tf*idf, lsi and multi-words for text classification. Expert Systems with Applications 38, 3 (2011), 2758–2765.
- Neural bellman-ford networks: A general graph neural network framework for link prediction. CoRR abs/2106.06935 (2021).
- Daniel Alfasi (1 paper)
- Tal Shapira (5 papers)
- Anat Bremler Barr (3 papers)