Automating SBOM Generation with Zero-Shot Semantic Similarity (2403.08799v1)
Abstract: It is becoming increasingly important in the software industry, especially with the growing complexity of software ecosystems and the emphasis on security and compliance for manufacturers to inventory software used on their systems. A Software-Bill-of-Materials (SBOM) is a comprehensive inventory detailing a software application's components and dependencies. Current approaches rely on case-based reasoning to inconsistently identify the software components embedded in binary files. We propose a different route, an automated method for generating SBOMs to prevent disastrous supply-chain attacks. Remaining on the topic of static code analysis, we interpret this problem as a semantic similarity task wherein a transformer model can be trained to relate a product name to corresponding version strings. Our test results are compelling, demonstrating the model's strong performance in the zero-shot classification task, further demonstrating the potential for use in a real-world cybersecurity context.
- Lavi Lazarovitz. Deconstructing the solarwinds breach. Computer Fraud & Security, 2021(6):17–19, 2021.
- Éamonn Ó Muirí. Framing software component transparency: Establishing a common software bill of material (sbom). NTIA, Nov, 12, 2019.
- Software bills of materials for iot and ot devices. IoT Security Foundation, 2023.
- The national vulnerability database (nvd): Overview, 2013-12-18 2013.
- Davs: Dockerfile analysis for container image vulnerability scanning. Computers, Materials & Continua, 72(1), 2022.
- New version, new answer: Investigating cybersecurity static-analysis tool findings. In 2023 IEEE International Conference on Cyber Security and Resilience (CSR), pages 28–35. IEEE, 2023.
- Large scale legal text classification using transformer models. arXiv preprint arXiv:2010.12871, 2020.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084, 2019.
- Zero-shot learning with semantic output codes. Advances in neural information processing systems, 22, 2009.
- Zero-shot learning via semantic similarity embedding. In Proceedings of the IEEE international conference on computer vision, pages 4166–4174, 2015.
- NTIA, 2021.
- Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144, 2016.