Papers
Topics
Authors
Recent
Search
2000 character limit reached

INFCODE-C++: Autonomous C++ Issue Resolution

Updated 27 November 2025
  • INFCODE-C++ is an autonomous, C++-aware multi-agent framework that resolves complex issues in large repositories by combining semantic retrieval and AST queries.
  • It leverages a dual-retrieval architecture with intent-guided semantic matching and structured AST querying to effectively localize bugs in intricate C++ codebases.
  • Empirical evaluations on the MultiSWE-bench-CPP benchmark show state-of-the-art performance, more than doubling resolution rates of prior systems.

INFCODE-C++ is an autonomous, C++-aware multi-agent framework designed for end-to-end issue resolution in large, statically typed C++ repositories. Addressing the limitations of lexical-retrieval and shallow code navigation techniques—commonplace in Python-oriented LLM agents—INFCODE-C++ integrates intent-guided semantic retrieval and deterministic abstract syntax tree (AST) queries. This dual-retrieval architecture targets the semantic and structural complexity characteristic of C++ projects, such as overloaded identifiers, nested namespaces, and deep template instantiations, which significantly hinder traditional LLM-driven agents. Evaluated on the MultiSWE-bench-CPP benchmark, INFCODE-C++ achieves state-of-the-art results, more than doubling resolution rates of prior systems (Dong et al., 20 Nov 2025).

1. System Architecture and Workflow

INFCODE-C++ is structured as a multi-agent pipeline interacting through four principal stages:

  1. Repository Parsing: Full-project code artifacts are indexed and embedded.
  2. Issue Reproduction (Reproducer Agent): Given a natural-language issue description, the agent synthesizes a reproducible test tDt_D.
  3. Patch Generation (Patch Agent): Employs semantic intent retrieval and AST-structured search to localize defects and synthesize candidate patches {p1,...,pn}\{p_1,...,p_n\}.
  4. Patch Selection (Selector Agent): Prunes, behaviorally tests, and votes on patches to select pfinalp_{\text{final}}.

Textual Collaboration Flow:

1
2
3
[Reproducer Agent] → generates t_D →
  [Patch Agent] → {QueryCodeIntent, FindClass/FindFunction/GetInheritanceChain…} → {p_1…p_n} →
    [Selector Agent] → Prune → Behavioral Test → Vote → p_final
The Patch Agent’s two-stage narrowing process first reduces the codebase to a succinct set of candidate modules via semantic retrieval, followed by precise localization within these modules using deterministic AST queries.

2. Semantic Code-Intent Retrieval

Intent Representation and Embedding:

All files, classes, and functions are embedded into a dense vector space using a pretrained C++-specific encoder E(â‹…)E(\cdot) during repository parsing. For an issue description DD, the derived query qq is embedded as vq=E(q)v_q = E(q). An intent index,

Iintent:{vAi}↦Ai,Ai∈{files, classes, functions}\mathcal{I}_\mathrm{intent}: \{v_{A_i}\} \mapsto A_i,\quad A_i \in \{\text{files, classes, functions}\}

maps these code artifacts.

Similarity Scoring and Retrieval:

Artifacts are ranked by cosine similarity:

score(q,A)=vq⋅vA∥vq∥  ∥vA∥\mathrm{score}(q, A) = \frac{v_q \cdot v_A}{\|v_q\|\;\|v_A\|}

An approximate nearest-neighbor index (e.g., FAISS) enables retrieval of the top-kk relevant artifacts CintentC_{\mathrm{intent}}. By design, ∣Cintent∣≪∣C∣|C_{\mathrm{intent}}| \ll |C|, sharply narrowing the context for downstream processing.

3. AST-Structured Querying

Global C++ AST Representation:

The codebase is parsed into TC=(N,E)\mathcal{T}_C = (\mathcal{N}, \mathcal{E}), where n∈Nn \in \mathcal{N} denotes syntactic constructs (e.g., NamespaceDecl, ClassDecl, TemplateInstantiation), and E\mathcal{E} captures relations such as containment, inheritance, overload sets, and function calls.

Deterministic Querying:

Within the semantic subset CintentC_{\mathrm{intent}}, the agent performs graph traversals and pattern matches to precisely locate defects, issuing queries such as:

  • FindClass("Search")
  • FindFunction({scope:"UI", name:"update", params:["int"]})
  • GetInheritanceChain("DerivedClass")
  • GetFunctionCalls("Database","insert")

Pseudocode Example:

1
2
3
4
5
6
7
8
// Find all overloads of foo in namespace Bar
nodes = AST.FindFunction(
  scope = "Bar",
  name  = "foo",
  allowOverloads = true
)
for each n in nodes:
  print(n.sourceLocation, n.signature)
Each query returns concrete AST nodes with associated source location spans.

4. Context Construction and Localization

Integration Process:

The candidate modules MM from semantic retrieval are refined by AST queries to yield GbugG_{\mathrm{bug}}, a subgraph implicated in the reported defect. Source locations LL are determined by:

L=Cintent∩nodes(Gbug)L = C_{\mathrm{intent}} \cap \mathrm{nodes}(G_{\mathrm{bug}})

Full source text for LL is extracted and concatenated, forming an input window for LLM-based patch synthesis.

Localization Strategy:

By limiting semantic retrieval to top-5 artifacts and restricting AST spans to a narrow window (±N\pm N lines), the approach avoids both context over-approximation and under-approximation, maximizing the token budget efficiency for downstream synthesis.

5. Empirical Evaluation and Benchmark Results

Benchmark and Evaluation Protocol:

INFCODE-C++ is evaluated on 129 C++ issues from five major GitHub repositories (MultiSWE-bench-CPP), each accompanied by an issue description DD and regression test suite TT. A solution is valid if:

  • The regression test tD(C′=C⊕p)t_D(C' = C \oplus p) passes for the candidate patch pp.
  • ∀t∈T:t(C′)=t(C)\forall t \in T: t(C') = t(C) (no behavioral regressions).

Resolution Rates:

System Resolution Rate (%)
INFCODE-C++ + GPT-5 25.58
MOpenHands + Claude-3.7 Sonnet 14.73
MSWE-agent + Claude-3.7 Sonnet 11.63
MAgentless + Claude-3.7 Sonnet 3.88

INFCODE-C++ exceeds the best prior system by 10.85 percentage points and more than doubles the performance of MSWE-agent. Stratified results (easy/medium/hard) show robust improvement in all categories (e.g., 50.00% on easy vs. 32.14% for next-best). The size of the improvement on 129 items indicates high statistical significance by binomial confidence interval analysis (Dong et al., 20 Nov 2025).

6. Ablation and Behavioral Analysis

Ablation Study:

Each major system component was removed in isolation. The following results quantify their marginal contributions:

Configuration Resolution Rate (%) Δ vs. Full
Full system (GPT-5) 25.58 —
w/o semantic code-intent retrieval 19.37 -6.21
w/o AST-structured querying 17.05 -8.53
w/o Reproducer Agent 20.16 -5.42
w/o Selector Agent 22.48 -3.10

The largest performance drop occurs when AST querying is removed, confirming its criticality for defect localization and patch synthesis in C++. Semantic retrieval also provides a substantial boost. Removal of either retrieval step increases LLM reasoning turns (from 28.1 to 35.3 or 45.3), indicating both efficiency and accuracy improvements.

Behavioral Breakdown:

  • Reproduction success: 28.81%
  • File-level localization: 55.10%
  • Function-level localization: 42.10%
  • End-to-end resolution: 25.58%

Majority of failures are attributable to reproduction and localization, affirming the necessity of combined semantic and structural retrieval.

7. Significance and Implications

INFCODE-C++ represents the first system to combine semantic code-intent embeddings with explicit AST-structured querying for C++ repair. Its two-pronged retrieval pipeline overcomes challenges unique to C++—such as deeply nested templates, type overloading, and intricate scoping—that degrade the effectiveness of approaches tuned for dynamically typed languages. The architecture demonstrates that language-aware retrieval and structural analysis are prerequisites for effective LLM-driven repair of complex, statically typed ecosystems.

Results on MultiSWE-bench-CPP set a new benchmark for autonomous C++ issue resolution, and ablation studies isolate the specific technical advances underlying this improvement (Dong et al., 20 Nov 2025). A plausible implication is that future work in multi-language LLM agents for code repair should adopt similar retrieval and defect localization strategies to extend state-of-the-art performance across statically typed domains.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to INFCODE-C++.