Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 84 tok/s

Gemini 2.5 Pro 37 tok/s Pro

GPT-5 Medium 18 tok/s Pro

GPT-5 High 15 tok/s Pro

GPT-4o 86 tok/s Pro

GPT OSS 120B 468 tok/s Pro

Kimi K2 229 tok/s Pro

2000 character limit reached

Improving the Context Length and Efficiency of Code Retrieval for Tracing Security Vulnerability Fixes (2503.22935v2)

Published 29 Mar 2025 in cs.CR and cs.SE

Abstract: An upstream task for software bill-of-materials (SBOMs) is the accurate localization of the patch that fixes a vulnerability. Nevertheless, existing work reveals a significant gap in the CVEs whose patches exist but are not traceable. Existing works have proposed several approaches to trace/retrieve the patching commit for fixing a CVE. However, they suffer from two major challenges: (1) They cannot effectively handle long diff code of a commit; (2) We are not aware of existing work that scales to the full repository with satisfactory accuracy. Upon identifying this gap, we propose SITPatchTracer, a scalable and effective retrieval system for tracing known vulnerability patches. To handle the context length challenge, SITPatchTracer proposes a novel hierarchical embedding technique which efficiently extends the context coverage to 6x that of existing work while covering all files in the commit. To handle the scalability challenge, SITPatchTracer utilizes a three-phase framework, balancing the effectiveness/efficiency in each phase. The evaluation of SITPatchTracer demonstrates it outperforms existing patch tracing methods (PatchFinder, PatchScout, VFCFinder) by a large margin. Furthermore, SITPatchTracer outperforms VoyageAI, the SOTA commercial code embedding LLM (\$1.8 per 10K commits) on the MRR and Recall@10 by 18\% and 28\% on our two datasets. Using SITPatchTracer, we have successfully traced and merged the patch links for 35 new CVEs in the GitHub Advisory database Our ablation study reveals that hierarchical embedding is a practically effective way of handling long context for patch retrieval.

Collections

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Authors (6)

Tweets

https://twitter.com/FSFG/status/1957653128708276725