Dice Question Streamline Icon: https://streamlinehq.com

Automated Identification of Vulnerability-Fixing Patches

Develop an automated method that, given a known vulnerability in an open-source software project, identifies the exact source-code commit (patch) in the project's revision control history that fixes the vulnerability, without relying on manual curation.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper discusses the importance of source-code patches for understanding and addressing vulnerabilities, noting that while version control systems record historical changes, automatically finding the specific fix commit for a vulnerability remains challenging. Existing datasets like CVE and NVD often lack reliable patch information, and approaches such as CVEFixes rely on textual cues (e.g., commit messages) that can be incomplete or inaccurate, offering no guarantees of correctness.

ARVO addresses patch identification for OSS-Fuzz-sourced vulnerabilities by constructing reproducible build environments and performing bisection to locate fixes. However, the broader problem of general automated patch identification—mapping vulnerabilities to their precise fixing commits across diverse projects and histories—remains unresolved, motivating this explicit open problem.

References

Revision control software makes patches possible by recording all the historical changes. However, automatic identification of patches is an unsolved problem.

ARVO: Atlas of Reproducible Vulnerabilities for Open Source Software (2408.02153 - Mei et al., 4 Aug 2024) in Section 2.2 (Patch Locating)