LLM-Assisted Concolic Execution

Updated 25 January 2026

Concolic execution with LLMs is a testing approach that integrates neural language models into both concrete and symbolic workflows to improve constraint solving and input generation.
LLMs are used to guide path constraint selection, mutation, and prioritization, replacing or augmenting traditional SMT solver methods for efficient branch exploration.
This integration has yielded significant gains in vulnerability detection, parser testing, and structured input synthesis while mitigating classical scalability challenges.

Concolic execution with LLMs denotes the integration of large-scale neural LLMs into the concolic (concrete + symbolic) execution and software testing workflow. In this paradigm, LLMs are invoked as reasoning agents for path constraint selection, constraint solving, path prioritization, and structured input synthesis—superseding or augmenting classical techniques reliant on Satisfiability Modulo Theories (SMT) solvers or handcrafted heuristics. Contemporary systems have demonstrated that LLMs can either replace SMT-based constraint solving (e.g., via prompt-driven input generation) or orchestrate symbolic engines for deeper, more semantically informed exploration. This confluence has produced state-of-the-art results in automated vulnerability detection, complex parser testing, and scalable path exploration, while sidestepping historic bottlenecks of the field (Meng et al., 2024, Eslamimehr, 18 Jan 2026, Tu et al., 24 Apr 2025).

1. Principal Methodologies for LLM-Assisted Concolic Execution

Three leading frameworks articulate distinct integration strategies:

This system abandons explicit symbolic path-constraint construction and solving in favor of LLM-driven input synthesis. In its core loop, a greybox fuzzer (e.g., AFL) accumulates seeds until it encounters a coverage “roadblock”—i.e., an input that reaches but does not flip a given branch. At that point, HyLLfuzz performs dynamic backward slicing to extract the minimal influence code segment leading to the blocked branch, attaches an assertion for branch-flipping, and passes both the slice and the concrete input to an LLM (Figure 1 prompt template). The LLM, prompted as an expert concolic tester, generates a fresh input to reach the unexplored branch, which is then fed back into the fuzzer’s corpus. This process repeats iteratively.

Classical concolic execution is augmented, not supplanted: a symbolic execution engine generates and manipulates formal path conditions, but an LLM module guides the workflow in three capacities:

Path prioritization: LLM scores unexplored branches for semantic novelty and exploration utility ( $S(b)$ ), focusing heuristic search.
Constraint mutation: LLM proposes syntactically valid relaxations or refactorings of hard-to-solve path conditions when SMT encounters a timeout or unsatisfiability.
Semantic input synthesis: When solver-based exploration is ineffective, the LLM proposes new inputs that are domain informed and likely to drive execution toward challenging states.

The LLM module thus acts as an advisory and generative oracle, closing the loop with path queue manipulations and validation via concrete execution.

This architecture targets highly structured input spaces, typical in parser and format validation contexts. It introduces the Expressive Structural Coverage Tree (ESCT), a path representation that captures structural, branch, and contextual features. The LLM acts as both a constraint solver—in a solve-complete chain-of-thought paradigm—and as an initial/fallback seed generator, ensuring that resultant test inputs are both constraint-satisfying and syntactically valid. Structural path constraint selection and deduplication, informed by ESCT weights, dramatically improves efficiency for format- and structure-heavy targets.

2. Execution Traces, Constraint Solving, and LLM Prompt Engineering

Trace Slicing and Segment Construction (HyLLfuzz)

Dynamic slicing is employed to minimize the relevant code given to the LLM: $\mathit{Vars} \gets \mathrm{VarsIn}(c) \ \mathcal{T}_{\mathrm{slice}} \gets \{\texttt{assert}(\lnot c)\} \ \text{for } s \in \mathcal{T} \text{ (reverse)}: \ \ \text{if } s \text{ writes/reads any } v \in \mathit{Vars}: \ \ \ \mathcal{T}_{\mathrm{slice}} \cup= \{s\};\ \mathit{Vars} \cup= \mathrm{deps}(s) \ \mathcal{S} \gets \mathrm{RenderSlice}(\mathcal{T}_{\mathrm{slice}})$ The final prompt delivered to the LLM includes the sliced code, the original input, and explicit output instructions (hex or base64), ensuring correct seed structure.

LLM Prompting for Constraint Solving and Input Synthesis

LLMs are provided with either symbolic path constraints (formal expressions or code fragments) plus context or, in structure-rich contexts, a two-stage prompt:

Solve (for constraint satisfaction)
Complete (for syntactic validity)

Cottontail, for instance, applies:

"Given constraint $pc$ on [k!i], choose $c$ so that $pc(k!i \mapsto c)$ holds."
"Now fill [xxx] for the entire string to be a syntactically valid input."

Constraint Mutation and Branch Prioritization (LLM-C)

When SMT solvers fail, LLMs are prompted to suggest constraint relaxations, e.g., "Here is a hard constraint $\varphi$ . Suggest a simpler but related variant $\varphi'$ ." Paths or branches are prioritized with a scoring function derived from LLM outputs: $S(b) = \mathrm{score}(\mathit{code},\mathit{PC},b)$ .

3. Formal Principles and Theoretical Impact

Mitigating Path Explosion

In classical concolic testing, the path space increases exponentially: $\Pi(n) = 2^n$ . LLM-driven prioritization reduces effective path exploration. If only top- $k$ branches are considered: $\mathit{Vars} \gets \mathrm{VarsIn}(c) \ \mathcal{T}_{\mathrm{slice}} \gets \{\texttt{assert}(\lnot c)\} \ \text{for } s \in \mathcal{T} \text{ (reverse)}: \ \ \text{if } s \text{ writes/reads any } v \in \mathit{Vars}: \ \ \ \mathcal{T}_{\mathrm{slice}} \cup= \{s\};\ \mathit{Vars} \cup= \mathrm{deps}(s) \ \mathcal{S} \gets \mathrm{RenderSlice}(\mathcal{T}_{\mathrm{slice}})$ 0 with $\mathit{Vars} \gets \mathrm{VarsIn}(c) \ \mathcal{T}_{\mathrm{slice}} \gets \{\texttt{assert}(\lnot c)\} \ \text{for } s \in \mathcal{T} \text{ (reverse)}: \ \ \text{if } s \text{ writes/reads any } v \in \mathit{Vars}: \ \ \ \mathcal{T}_{\mathrm{slice}} \cup= \{s\};\ \mathit{Vars} \cup= \mathrm{deps}(s) \ \mathcal{S} \gets \mathrm{RenderSlice}(\mathcal{T}_{\mathrm{slice}})$ 1. Empirically, this allows deeper or broader state space coverage without combinatorial explosion.

Constraint Solving Complexity

Traditional SMT solving incurs exponential cost: $\mathit{Vars} \gets \mathrm{VarsIn}(c) \ \mathcal{T}_{\mathrm{slice}} \gets \{\texttt{assert}(\lnot c)\} \ \text{for } s \in \mathcal{T} \text{ (reverse)}: \ \ \text{if } s \text{ writes/reads any } v \in \mathit{Vars}: \ \ \ \mathcal{T}_{\mathrm{slice}} \cup= \{s\};\ \mathit{Vars} \cup= \mathrm{deps}(s) \ \mathcal{S} \gets \mathrm{RenderSlice}(\mathcal{T}_{\mathrm{slice}})$ 2. LLMs bypass this by leveraging training-induced semantic heuristics, albeit without formal completeness; thus, they are efficient in practice for plausible (though not guaranteed) constraint satisfaction.

4. Quantitative Evaluation and Empirical Results

Subject	AFL	QSYM	Intriguer	HyLLfuzz
as-new	4640	5640	4832	8145 (+75.5%)
cflow	1141	1194	1153	1238 (+8.5%)
cJSON	343	347	344	372 (+8.5%)
cxxfilt	1763	2039	1951	2448 (+38.9%)
libxml2	2819	3164	3092	6812 (+141.6%)
MuJS	1907	2078	2096	3666 (+92.2%)

Branch coverage improvement over AFL alone: +60.9%, over QSYM: +44.5%, over Intriguer: +50.8%.
Median solve time: HyLLfuzz 4.97s, QSYM 20.95s (4.2x slower), Intriguer 95s (19.1x slower).
Effective-input rate: 13.2% for HyLLfuzz (inputs leading to new coverage).

Technique	Synthetic Branch %	Synthetic Paths	Fintech Branch %	Fintech Paths
Random	45.2	1204	38.1	987
GA	68.9	3456	55.4	2876
Concolic	75.6	8923	62.3	7890
LLM-C	91.3	15678	85.7	14567

SMT solver invocations: Reduction from 15,432 (classical) to 8,765 (LLM-C); timeouts from 1,234 to 245.
Statistical improvement: $\mathit{Vars} \gets \mathrm{VarsIn}(c) \ \mathcal{T}_{\mathrm{slice}} \gets \{\texttt{assert}(\lnot c)\} \ \text{for } s \in \mathcal{T} \text{ (reverse)}: \ \ \text{if } s \text{ writes/reads any } v \in \mathit{Vars}: \ \ \ \mathcal{T}_{\mathrm{slice}} \cup= \{s\};\ \mathit{Vars} \cup= \mathrm{deps}(s) \ \mathcal{S} \gets \mathrm{RenderSlice}(\mathcal{T}_{\mathrm{slice}})$ 3, Cohen’s $\mathit{Vars} \gets \mathrm{VarsIn}(c) \ \mathcal{T}_{\mathrm{slice}} \gets \{\texttt{assert}(\lnot c)\} \ \text{for } s \in \mathcal{T} \text{ (reverse)}: \ \ \text{if } s \text{ writes/reads any } v \in \mathit{Vars}: \ \ \ \mathcal{T}_{\mathrm{slice}} \cup= \{s\};\ \mathit{Vars} \cup= \mathrm{deps}(s) \ \mathcal{S} \gets \mathrm{RenderSlice}(\mathcal{T}_{\mathrm{slice}})$ 4.

Line coverage improvement: +14.15% over SymCC, +14.31% over Marco.
Branch coverage: +15.96% over SymCC, +11.10% over Marco.
Parser pass rate: up to 100x that of Z3-based constraint solving (e.g. 32.6% on Libxml2 vs. <5% for Z3).
New vulnerabilities: 6 new CVEs found, 4 patched.

5. Comparative Strengths, Weaknesses, and Limitations

Advantages

SMT solver bottleneck mitigation: LLMs bypass lengthy or impossible constraint solving steps, especially for low-level or highly non-linear constraints.
Structural and semantic generalization: LLMs leverage global code context and prior format knowledge; effective for reaching deep branches guarded by hard-to-reason-about syntactic rules, checksums, or magic constants.
Adaptive input synthesis: LLMs can synthesize structurally valid test cases, outperforming structure-unaware bit-level mutation or snapshotting.

Weaknesses and Open Challenges

Source dependence: Most methods require source code (not binary), as code slicing and ESCT construction depend on AST or IR analysis.
LLM hallucinations and reliability: LLMs may generate unsatisfiable, irrelevant, or invalid suggestions; results must be concretely validated post-hoc.
Prompt/token limitations: Very large code or input slices may exceed LLM prompt boundaries.
Arithmetic/complex constraints: LLMs are not reliable at solving intricate linear/symbolic systems; some papers propose SMT fallback for such cases.
Cost and latency: API invocation overhead and reliance on proprietary models (e.g., GPT-4o, GPT-5.1) introduce latency and reproducibility constraints.

6. Extensions, Applications, and Future Directions

Domain-specialized and open-source LLMs: Exploration of compact or self-hosted models for privacy and integration (Tu et al., 24 Apr 2025, Eslamimehr, 18 Jan 2026).
Security analysis: LLM-guided symbolic taint analysis and security-specific heuristics, extending coverage for vulnerability detection.
Hybridization with greybox fuzzing: Combining greybox heuristics with LLM-augmented symbolic exploration for improved robustness.
Driver/harness synthesis: Automating input driver construction by leveraging LLM chain-of-thought instruction.
Multi-agent LLM architectures: Orchestrating multiple LLMs for proposal, validation, and refinement roles.
Extension to binaries and dataflow dependency tracking: Adapting slicing and constraint selection to decompiled or binary targets.
Automated prompt adaptation and budgeting: Dynamic modulation between “minimal edit” and “havoc” generations, adjusting for program state and progress.

By incorporating LLMs into concolic workflows, the field has achieved demonstrable gains in program exhaustiveness, bug discovery, and test case validity across multiple benchmarks and software domains. The synthesized approaches eliminate classical scalability bottlenecks and open new avenues for semantically informed, large-scale software testing (Meng et al., 2024, Eslamimehr, 18 Jan 2026, Tu et al., 24 Apr 2025).

Markdown Report Issue Upgrade to Chat

References (3)

Large Language Model assisted Hybrid Fuzzing (2024)

Hybrid Concolic Testing with Large Language Models for Guided Path Exploration (2026)

Large Language Model-Driven Concolic Execution for Highly Structured Test Input Generation (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Concolic Execution with Large Language Models.

LLM-Assisted Concolic Execution

1. Principal Methodologies for LLM-Assisted Concolic Execution

HyLLfuzz ("hill fuzz") (Meng et al., 2024)

LLM-C ("Hybrid Concolic Testing with LLMs") (Eslamimehr, 18 Jan 2026)

Cottontail (Tu et al., 24 Apr 2025)

2. Execution Traces, Constraint Solving, and LLM Prompt Engineering

Trace Slicing and Segment Construction (HyLLfuzz)

LLM Prompting for Constraint Solving and Input Synthesis

Constraint Mutation and Branch Prioritization (LLM-C)

3. Formal Principles and Theoretical Impact

Mitigating Path Explosion

Constraint Solving Complexity

4. Quantitative Evaluation and Empirical Results

HyLLfuzz (Meng et al., 2024)

LLM-C (Eslamimehr, 18 Jan 2026)

Cottontail (Tu et al., 24 Apr 2025)

5. Comparative Strengths, Weaknesses, and Limitations

Advantages

Weaknesses and Open Challenges

6. Extensions, Applications, and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

LLM-Assisted Concolic Execution

1. Principal Methodologies for LLM-Assisted Concolic Execution

HyLLfuzz ("hill fuzz") (Meng et al., 2024)

LLM-C ("Hybrid Concolic Testing with LLMs") (Eslamimehr, 18 Jan 2026)

Cottontail (Tu et al., 24 Apr 2025)

2. Execution Traces, Constraint Solving, and LLM Prompt Engineering

Trace Slicing and Segment Construction (HyLLfuzz)

LLM Prompting for Constraint Solving and Input Synthesis

Constraint Mutation and Branch Prioritization (LLM-C)

3. Formal Principles and Theoretical Impact

Mitigating Path Explosion

Constraint Solving Complexity

4. Quantitative Evaluation and Empirical Results

HyLLfuzz (Meng et al., 2024)

LLM-C (Eslamimehr, 18 Jan 2026)

Cottontail (Tu et al., 24 Apr 2025)

5. Comparative Strengths, Weaknesses, and Limitations

Advantages

Weaknesses and Open Challenges

6. Extensions, Applications, and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research