BugScope: LLM-Driven Bug Detection

Updated 29 January 2026

BugScope is an LLM-driven system that emulates human bug audits by adapting dynamic retrieval strategies and program slicing for improved detection of software defects.
The system integrates a Context Retrieval Agent and a Bug Detection Agent to synthesize custom detection prompts from code examples using few-shot learning.
Quantitative evaluations show BugScope achieves superior precision and recall over industrial baselines, uncovering novel bugs in large-scale open-source projects.

BugScope is an LLM-driven multi-agent system for software bug detection that emulates human auditors by learning new bug patterns from representative example sets and applying that knowledge during code auditing. Departing from traditional static analysis, which relies on fixed symbolic workflows with limited adaptability to diverse real-world defects and anti-patterns, BugScope synthesizes a custom retrieval and detection pipeline for each bug class using dynamically constructed prompts and program slicing. Quantitative evaluation demonstrates superior precision and recall compared to industrial baselines and reveals substantial practical impact through novel bug discovery in large open-source projects (Guo et al., 21 Jul 2025).

1. System Architecture and Workflow

BugScope’s architecture operationalizes the human bug-auditing workflow—identifying suspicious code elements as “seeds,” retrieving salient context via program slicing, and applying deep semantic reasoning—by deploying two collaborating LLM-powered agents:

Context Retrieval Agent: Responsible for example selection, retrieval-strategy synthesis, and program slicing (forward or backward).
Bug Detection Agent: Composed of detection-prompt synthesis (few-shot with embedded reasoning hints), LLM-driven decision, and a validator for hallucination filtering.

The pipeline begins with ingestion of paired buggy (BE) and non-buggy (NE) code examples representing the target anti-pattern. These are analyzed by the LLM to extract: (a) the program construct(s) that serve as slicing seeds (faulty values or dangerous operands) (b) the appropriate slicing direction (forward or backward).

This yields a retrieval strategy expressed as a seed extractor $S(\cdot)$ and direction $d\in\{\text{forward}, \text{backward}\}$ . The Context Retrieval Agent then applies program slicing on the AST and/or call graph, isolating minimal, self-contained code snippets for further analysis. The Bug Detection Agent synthesizes a tailored detection prompt $T$ —including chain-of-thought and reasoning hints—fed to the LLM for final bug decision and validation.

2. Formal Definitions and Algorithmic Constructs

BugScope’s operational semantics are formalized as follows:

2.1 Retrieval Strategy Synthesis

Given example set $E = \{(c_i, y_i)\}$ with $c_i$ labeled $y_i\in\{\text{buggy}, \text{non-buggy}\}$ , the LLM induces

a seed-classification function $S: AST \to \{\text{FAULTY}, \text{DANGEROUS}, \bot\}$
and retrieval direction $d\in\{\rightarrow, \leftarrow\}$ , maximizing $\sum_{(c,y)\in E}\text{Score}((S', d'), c, y)$ , where Score measures discrimination between BE and NE (implicitly learned via chain-of-thought prompts).

2.2 Program Slicing

Given PDG $(V, E_{\text{data}}\cup E_{\text{ctrl}})$ and slicing seed $s\in V$ :

Forward slice: $FS(s)=\{v\in V\mid\exists$ path $s\to^* v$ following $E_{\text{data}}\cup E_{\text{ctrl}}\}$
Backward slice: $BS(s)=\{v\in V\mid\exists$ path $v\to^* s$ following $E_{\text{data}}\cup E_{\text{ctrl}}\}$ Minimal subprogram $C=$ inline ( $FS(s)$ ) or inline ( $BS(s)$ ), call depth limited to $K=3$ .

2.3 Detection Prompt Construction

Detection prompt $P$ constructed by $\mathrm{BuildPrompt}(\mathrm{Ex}, H, \mathrm{COT})$ , where:

$\mathrm{Ex}=$ few-shot examples
$H=$ reasoning hints (“check numeric range,” “track pointer aliases,” etc.)
$\mathrm{COT}=$ chain-of-thought instructions

Inference: $P(y\mid C) = \mathrm{LLM}(P,\, C)$ , yielding label $y\in\{\text{Bug}, \text{No Bug}\}$ via LLM token probabilities.

3. Implementation Details and Processing Steps

The following pseudocode summarizes BugScope’s three canonical phases:

Retrieval-Strategy Synthesis

Prompt LLM with buggy/non-buggy code pairs and ask for:
  (a) bug-triggering variables/expressions
  (b) slicing direction (forward/backward)
Parse LLM reply into S and d
return (S, d)

Context Extraction via Slicing

Parse F with Tree-sitter → AST + call-graph
for each node n in F:
    if S(n) ≠ ⊥:
        slice_nodes = FS(n) if d == forward else BS(n)
        expand interprocedurally up to depth K
        C ← inline_and_simplify(slice_nodes)
        collect C
return collected C

BasePrompt ← few-shot with (BE, NE)
Insert H as guidance
for i in 1..R:
    P′ ← LLM("Revise this prompt to better highlight anti-pattern", BasePrompt)
    if quality(P′) > quality(BasePrompt):
        BasePrompt ← P′
return BasePrompt

Each snippet $C$ is concatenated into $P$ and fed to the LLM; candidate “Bug” results are re-validated before reporting.

4. Representative Anti-Patterns and Specialized Processing

BugScope’s strategy adapts fluidly across a range of anti-patterns, as illustrated by example code snippets:

Anti-Pattern	Slice Seed & Direction	Reasoning Hints / Prompt Example
Oversized Offset (OSO) ‒ OOB	Faulty $size$ (back)	“check size vs buffer length”
Allocation Size Overflow (ASO) ‒ OOB	$n*\mathrm{sizeof(int)}$ (forward)	“track integer wrap-around after multiplication”
Insufficient Zero Check (IZC) ‒ DBZ	Divisor $z$ (forward)	“check value-range >0, not just non-negative”
System-Specific (OOB & DBZ)	$d$ (forward)	“flag all uses of d->block[0]; buffer & division”

The system generalizes to system-specific and compound bug classes, fusing multiple reasoning hints within flexible prompt templates.

5. Quantitative Evaluation and Comparative Analysis

BugScope achieves demonstrable superiority over industrial baselines on two axes: controlled benchmarks and real-world discovery.

Forty Real-World Bugs (across 7 anti-patterns)
- Precision: $87.04\%$
- Recall: $90.00\%$
- $F_1$ : $0.88$
Breakdown by anti-pattern
- OSO: $P=66.7\%$ , $R=80\%$
- NOF: $P=77.8\%$ , $R=100\%$
- ASO: $P=100\%$ , $R=100\%$
- IZC: $P=87.5\%$ , $R=66.7\%$
- LZD, UEC, MSC: all above $85\%$ precision/recall

Tool	Precision	Recall	$F_1$
BugScope	87.04 %	90.00 %	0.88
RepoAudit	32.14 %	42.50 %	0.37
Cursor BugBot	71.43 %	27.50 %	0.40
CodeRabbit	76.92 %	17.50 %	0.29
Meta Infer	7.69 %	2.50 %	0.04

On large open-source projects, including the Linux kernel, BugScope uncovered 141 novel bugs, with 78 already fixed and 7 confirmed by developers as impactful.

6. Guarantees, Limitations, and Key Insights

No formal soundness or completeness guarantees are provided—consistent with Rice’s theorem on the undecidability of generic bug detection, BugScope prioritizes empirical coverage and adaptability. Central design insights:

Generalization to novel anti-patterns arises from learning retrieval seeds and slicing strategies via few-shot examples, as opposed to hard-coded symbolic rules.
Multi-agent separation of context retrieval and detection mitigates LLM hallucination: slicing delivers focused code regions, and prompt synthesis ensures contextually-relevant reasoning.
Limiting interprocedural slice depth ( $K=3$ ) balances recall, precision, and scalability, with evaluation showing robust recall (90%) on real bugs.

Limitations include:

Dependence on exemplars: poor BE/NE selection may compromise retrieval strategy.
Bounded slice expansion: deep or dynamic control-flow may elude discovery.
Residual LLM hallucinations: a validation pass reduces, but does not entirely eliminate, this issue.

A plausible implication is that BugScope’s “learn to learn” methodology introduces an extensible paradigm for pattern-driven, example-centric bug discovery that is systematically adaptable to new domains, anti-patterns, and evolving codebases. This approach demonstrates that the human-like audit workflow—study examples, extract context, reason semantically—covers a broad spectrum of defects with high precision and recall (Guo et al., 21 Jul 2025).

Markdown Report Issue Upgrade to Chat

References (1)

BugScope: Learn to Find Bugs Like Human (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to BugScope.

BugScope: LLM-Driven Bug Detection

1. System Architecture and Workflow

2. Formal Definitions and Algorithmic Constructs

2.1 Retrieval Strategy Synthesis

2.2 Program Slicing

2.3 Detection Prompt Construction

3. Implementation Details and Processing Steps

Retrieval-Strategy Synthesis

Context Extraction via Slicing

Prompt Generation & Refinement

4. Representative Anti-Patterns and Specialized Processing

5. Quantitative Evaluation and Comparative Analysis

6. Guarantees, Limitations, and Key Insights

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

BugScope: LLM-Driven Bug Detection

1. System Architecture and Workflow

2. Formal Definitions and Algorithmic Constructs

2.1 Retrieval Strategy Synthesis

2.2 Program Slicing

2.3 Detection Prompt Construction

3. Implementation Details and Processing Steps

Retrieval-Strategy Synthesis

Context Extraction via Slicing

Prompt Generation & Refinement

4. Representative Anti-Patterns and Specialized Processing

5. Quantitative Evaluation and Comparative Analysis

6. Guarantees, Limitations, and Key Insights

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research