INFCODE-C++: Autonomous C++ Issue Resolution

Updated 27 November 2025

INFCODE-C++ is an autonomous, C++-aware multi-agent framework that resolves complex issues in large repositories by combining semantic retrieval and AST queries.
It leverages a dual-retrieval architecture with intent-guided semantic matching and structured AST querying to effectively localize bugs in intricate C++ codebases.
Empirical evaluations on the MultiSWE-bench-CPP benchmark show state-of-the-art performance, more than doubling resolution rates of prior systems.

INFCODE-C++ is an autonomous, C++-aware multi-agent framework designed for end-to-end issue resolution in large, statically typed C++ repositories. Addressing the limitations of lexical-retrieval and shallow code navigation techniques—commonplace in Python-oriented LLM agents—INFCODE-C++ integrates intent-guided semantic retrieval and deterministic abstract syntax tree (AST) queries. This dual-retrieval architecture targets the semantic and structural complexity characteristic of C++ projects, such as overloaded identifiers, nested namespaces, and deep template instantiations, which significantly hinder traditional LLM-driven agents. Evaluated on the MultiSWE-bench-CPP benchmark, INFCODE-C++ achieves state-of-the-art results, more than doubling resolution rates of prior systems (Dong et al., 20 Nov 2025).

1. System Architecture and Workflow

INFCODE-C++ is structured as a multi-agent pipeline interacting through four principal stages:

Repository Parsing: Full-project code artifacts are indexed and embedded.
Issue Reproduction (Reproducer Agent): Given a natural-language issue description, the agent synthesizes a reproducible test $t_D$ .
Patch Generation (Patch Agent): Employs semantic intent retrieval and AST-structured search to localize defects and synthesize candidate patches $\{p_1,...,p_n\}$ .
Patch Selection (Selector Agent): Prunes, behaviorally tests, and votes on patches to select $p_{\text{final}}$ .

Textual Collaboration Flow:

1
2
3

[Reproducer Agent] → generates t_D →
  [Patch Agent] → {QueryCodeIntent, FindClass/FindFunction/GetInheritanceChain…} → {p_1…p_n} →
    [Selector Agent] → Prune → Behavioral Test → Vote → p_final

The Patch Agent’s two-stage narrowing process first reduces the codebase to a succinct set of candidate modules via semantic retrieval, followed by precise localization within these modules using deterministic AST queries.

2. Semantic Code-Intent Retrieval

Intent Representation and Embedding:

All files, classes, and functions are embedded into a dense vector space using a pretrained C++-specific encoder $E(\cdot)$ during repository parsing. For an issue description $D$ , the derived query $q$ is embedded as $v_q = E(q)$ . An intent index,

$\mathcal{I}_\mathrm{intent}: \{v_{A_i}\} \mapsto A_i,\quad A_i \in \{\text{files, classes, functions}\}$

maps these code artifacts.

Similarity Scoring and Retrieval:

Artifacts are ranked by cosine similarity:

$\mathrm{score}(q, A) = \frac{v_q \cdot v_A}{\|v_q\|\;\|v_A\|}$

An approximate nearest-neighbor index (e.g., FAISS) enables retrieval of the top- $k$ relevant artifacts $C_{\mathrm{intent}}$ . By design, $|C_{\mathrm{intent}}| \ll |C|$ , sharply narrowing the context for downstream processing.

3. AST-Structured Querying

Global C++ AST Representation:

The codebase is parsed into $\mathcal{T}_C = (\mathcal{N}, \mathcal{E})$ , where $n \in \mathcal{N}$ denotes syntactic constructs (e.g., NamespaceDecl, ClassDecl, TemplateInstantiation), and $\mathcal{E}$ captures relations such as containment, inheritance, overload sets, and function calls.

Deterministic Querying:

Within the semantic subset $C_{\mathrm{intent}}$ , the agent performs graph traversals and pattern matches to precisely locate defects, issuing queries such as:

FindClass("Search")
FindFunction({scope:"UI", name:"update", params:["int"]})
GetInheritanceChain("DerivedClass")
GetFunctionCalls("Database","insert")

Pseudocode Example:

// Find all overloads of foo in namespace Bar
nodes = AST.FindFunction(
  scope = "Bar",
  name  = "foo",
  allowOverloads = true
)
for each n in nodes:
  print(n.sourceLocation, n.signature)

Each query returns concrete AST nodes with associated source location spans.

4. Context Construction and Localization

Integration Process:

The candidate modules $M$ from semantic retrieval are refined by AST queries to yield $G_{\mathrm{bug}}$ , a subgraph implicated in the reported defect. Source locations $L$ are determined by:

$L = C_{\mathrm{intent}} \cap \mathrm{nodes}(G_{\mathrm{bug}})$

Full source text for $L$ is extracted and concatenated, forming an input window for LLM-based patch synthesis.

Localization Strategy:

By limiting semantic retrieval to top-5 artifacts and restricting AST spans to a narrow window ( $\pm N$ lines), the approach avoids both context over-approximation and under-approximation, maximizing the token budget efficiency for downstream synthesis.

5. Empirical Evaluation and Benchmark Results

Benchmark and Evaluation Protocol:

INFCODE-C++ is evaluated on 129 C++ issues from five major GitHub repositories (MultiSWE-bench-CPP), each accompanied by an issue description $D$ and regression test suite $T$ . A solution is valid if:

The regression test $t_D(C' = C \oplus p)$ passes for the candidate patch $p$ .
$\forall t \in T: t(C') = t(C)$ (no behavioral regressions).

Resolution Rates:

System	Resolution Rate (%)
INFCODE-C++ + GPT-5	25.58
MOpenHands + Claude-3.7 Sonnet	14.73
MSWE-agent + Claude-3.7 Sonnet	11.63
MAgentless + Claude-3.7 Sonnet	3.88

INFCODE-C++ exceeds the best prior system by 10.85 percentage points and more than doubles the performance of MSWE-agent. Stratified results (easy/medium/hard) show robust improvement in all categories (e.g., 50.00% on easy vs. 32.14% for next-best). The size of the improvement on 129 items indicates high statistical significance by binomial confidence interval analysis (Dong et al., 20 Nov 2025).

6. Ablation and Behavioral Analysis

Ablation Study:

Each major system component was removed in isolation. The following results quantify their marginal contributions:

Configuration	Resolution Rate (%)	Δ vs. Full
Full system (GPT-5)	25.58	—
w/o semantic code-intent retrieval	19.37	-6.21
w/o AST-structured querying	17.05	-8.53
w/o Reproducer Agent	20.16	-5.42
w/o Selector Agent	22.48	-3.10

The largest performance drop occurs when AST querying is removed, confirming its criticality for defect localization and patch synthesis in C++. Semantic retrieval also provides a substantial boost. Removal of either retrieval step increases LLM reasoning turns (from 28.1 to 35.3 or 45.3), indicating both efficiency and accuracy improvements.

Behavioral Breakdown:

Reproduction success: 28.81%
File-level localization: 55.10%
Function-level localization: 42.10%
End-to-end resolution: 25.58%

Majority of failures are attributable to reproduction and localization, affirming the necessity of combined semantic and structural retrieval.

7. Significance and Implications

INFCODE-C++ represents the first system to combine semantic code-intent embeddings with explicit AST-structured querying for C++ repair. Its two-pronged retrieval pipeline overcomes challenges unique to C++—such as deeply nested templates, type overloading, and intricate scoping—that degrade the effectiveness of approaches tuned for dynamically typed languages. The architecture demonstrates that language-aware retrieval and structural analysis are prerequisites for effective LLM-driven repair of complex, statically typed ecosystems.

Results on MultiSWE-bench-CPP set a new benchmark for autonomous C++ issue resolution, and ablation studies isolate the specific technical advances underlying this improvement (Dong et al., 20 Nov 2025). A plausible implication is that future work in multi-language LLM agents for code repair should adopt similar retrieval and defect localization strategies to extend state-of-the-art performance across statically typed domains.

Markdown Report Issue Upgrade to Chat

References (1)

InfCode-C++: Intent-Guided Semantic Retrieval and AST-Structured Search for C++ Issue Resolution (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to INFCODE-C++.

INFCODE-C++: Autonomous C++ Issue Resolution

1. System Architecture and Workflow

2. Semantic Code-Intent Retrieval

3. AST-Structured Querying

4. Context Construction and Localization

5. Empirical Evaluation and Benchmark Results

6. Ablation and Behavioral Analysis

7. Significance and Implications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

INFCODE-C++: Autonomous C++ Issue Resolution

1. System Architecture and Workflow

2. Semantic Code-Intent Retrieval

3. AST-Structured Querying

4. Context Construction and Localization

5. Empirical Evaluation and Benchmark Results

6. Ablation and Behavioral Analysis

7. Significance and Implications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research