VULSOLVER: Constraint-Based Vulnerability Detection

Updated 7 September 2025

VULSOLVER is a framework that formulates vulnerability detection as a constraint-solving problem, combining SAST with LLM semantic reasoning.
It employs code summaries and progressive constraint modeling to analyze call chains and detect high-severity vulnerabilities across diverse codebases.
VULSOLVER achieves high accuracy and perfect recall on benchmarks while uncovering previously unknown vulnerabilities in real-world repositories.

VULSOLVER is a vulnerability detection framework that formulates the detection process as a constraint-solving problem, integrating static application security testing (SAST) with the semantic reasoning capabilities of LLMs. Its design aims to enable the LLM to act analogously to a professional human security analyst, overcoming notable challenges in previous rule-based and LLM-driven approaches. VULSOLVER demonstrates high accuracy on standard benchmarks and successfully uncovers previously unknown high-severity vulnerabilities in large-scale real-world repositories.

1. Foundational Principles and System Architecture

VULSOLVER operates by transforming traditional vulnerability detection—which historically involves rule-based matching—into a formal constraint satisfaction problem. The framework consists of two main modules:

Code Information Summary Generation: SAST tools are used to analyze program source code and produce standardized summaries, including method details, type information, call chains, and data-flow metadata. This preprocessing step outputs JSON representations that abstract away language-specific details and vulnerability types, enabling generalized downstream analysis.
Semantic-Based Constraint Solving: The LLM ingests the code summaries and breaks down the detection workflow into a progressive reasoning sequence, analyzing pairs of functions along call chains, verifying inter-method relationships and the propagation of exploitable state. The LLM is tasked to solve for both transfer and trigger constraints stepwise along the execution path, modeling the human expert’s approach to validating exploitation feasibility.

This approach ensures VULSOLVER maintains architectural independence from programming language and vulnerability type, allowing it to scale to diverse codebases.

2. Formal Constraint Modeling and Semantic Reasoning

The core methodology involves positing the detection process as satisfying two types of constraints:

Transfer Constraints ( $\Phi_{\mathrm{tr}}$ ): For each consecutive caller–callee pair ( $m_i$ , $m_{i+1}$ ) along a call chain, the model assesses whether the required parameter values and state can be feasibly transferred. Formally, the set:

$\Phi_{\mathrm{tr}} = \{\Phi_{\mathrm{tr}}^1, \Phi_{\mathrm{tr}}^2, \ldots, \Phi_{\mathrm{tr}}^{n-1}\}$

encapsulates these feasibility checks across the $n$ -step chain.

Trigger Constraints ( $\Phi_{\mathrm{tg}}$ ): At the terminal "sink" (the vulnerability point), the model checks if the incoming parameter set $S_n$ meets the necessary conditions for exploitation (e.g., an unfiltered input enabling command injection).

The detection outcome is defined as:

$\text{Input} \models \Phi_{\mathrm{tr}} \wedge \Phi_{\mathrm{tg}}$

Meaning that a vulnerability exists if an input can simultaneously satisfy all transfer and trigger constraints.

The LLM operates not as a free-form generator, but as a guided constraint solver, handling each step in a carefully pruned semantic context derived from prior reasoning. This progressive narrowing of context mitigates hallucination and instability issues commonly observed in unconstrained models.

3. Evaluation Methodology and Metrics

VULSOLVER’s performance was assessed on the OWASP Benchmark (1,023 labeled samples), employing the following metrics:

Metric	Value	Description
Accuracy	96.29%	Correct classification rate (vulnerable / non-vulnerable)
F1-score	96.55%	Harmonic mean of precision and recall, reflecting low false rates
Recall	100%	Fraction of known vulnerabilities detected (no false negatives)

These results collectively indicate state-of-the-art vulnerability detection capability, characterized by both low false positives and perfect coverage of labeled vulnerabilities.

4. Robustness, Context Management, and Scalability

VULSOLVER introduces several innovations to address three major issues in prior vulnerability detection:

Instability: By decomposing the detection into step-by-step constraint analysis, VULSOLVER eliminates the randomness and hallucination associated with end-to-end LLM responses.
Context-Length Limitation: Code summaries and semantic context maintenance ensure only relevant data is propagated, avoiding issues when codebases exceed LLM context windows.
Misinterpretation/Hallucination: Every reasoning step is cast as a bounded constraint-solving task, substantially reducing the semantic space in which errors may occur.

The system’s abstraction via code summaries, decoupled from the specifics of language and vulnerability type, provides a pathway for scaling to large multi-language codebases and complex cross-function vulnerabilities.

5. Real-World Application and High-Severity Vulnerability Discovery

VULSOLVER’s practical effectiveness was confirmed by its deployment on popular GitHub repositories. In these real-world applications, it accurately identified 15 previously unknown high-severity vulnerabilities (CVSS 7.5–9.8) that had escaped detection by conventional tools. This underscores VULSOLVER’s robustness and its applicability in operational software security scenarios.

Use of call-chain analysis, semantic pruning, and constraint satisfaction enables VULSOLVER to analyze long-range vulnerabilities spanning multiple interacting methods—a common limitation in static rule-based systems.

6. Comparative Assessment and Methodological Innovations

Relative to traditional SAST tools (e.g., CodeQL) and previous LLM-based prompting frameworks, VULSOLVER provides:

Superior accuracy and F1-score, notably for demanding vulnerability classes (e.g., command injection).
Perfect recall, eliminating the possibility of missed vulnerabilities.
Increased interpretability through explicit constraint modeling and analysis traceability.
Generality across languages and vulnerability types by establishing a unified JSON summary as the interface between SAST output and LLM reasoning.

Its design, which decomposes analysis into branch method analysis, context maintenance, and main path analysis, closely mirrors expert logic and facilitates integration with advanced security workflows.

7. Implications, Limitations, and Future Directions

VULSOLVER demonstrates that recasting vulnerability detection as LLM-driven constraint solving yields high empirical performance and considerable practical value (Li et al., 31 Aug 2025). Its modular, scalable architecture and semantic rigor position it well for adoption in real-world secure development lifecycles.

Potential future directions include:

Extending the formal constraint-solving approach to additional vulnerability types and programming languages.
Automating context maintenance through hierarchical abstraction for even larger codebases.
Integrating VULSOLVER with patch generation and automated repair systems for closed-loop vulnerability management.

A plausible implication is that the formal decomposition of semantic reasoning will generalize to other domains in secure software analysis, such as exploitability prediction and code provenance tracing, given the strong results reported for both accuracy and recall.

In summary, VULSOLVER introduces a robust, interpretable, and highly accurate framework for vulnerability detection, representing a significant advancement in the application of LLMs to software security analysis.

PDF Markdown Chat (Pro)

References (1)

VULSOVER: Vulnerability Detection via LLM-Driven Constraint Solving (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to VULSOLVER.