InfCode Framework Overview

Updated 12 March 2026

InfCode framework is a comprehensive approach that formalizes, analyzes, and automates the relationships between code, infrastructure, and control logic.
It uses adversarial multi-agent systems and LLM-based strategies to generate patches and refine test suites, achieving significant improvements in repair accuracy.
Key applications include automated C++ repository repair, cloud infrastructure synthesis, and safety-critical control code logic, all aimed at reducing manual effort and operational risks.

The InfCode framework encompasses disparate but conceptually linked efforts at formalizing, reasoning about, and automating the relationships between code, infrastructure, issue resolution, and control logic across the domains of software engineering, autonomous repair, cloud infrastructure synthesis, and device-level embedded logic. It provides both concrete algorithmic systems (such as LLM-based agents for automated patching and infrastructure synthesis) and abstract logic for trust, testing, and operational safety. Major instantiations include the InfCode-C++ system for C++ repository repair, the adversarial InfCode pipeline for robust patch/test co-generation, and architectures for deriving infrastructure from application code. The term also attaches to influential logic for informal control code reasoning, especially in safety-critical contexts. Collectively, these systems highlight InfCode as a foundation for automated, reproducible, and rigorous management of code and its operational environment.

1. Core Concepts: Definitions and Motivations

The InfCode designation encompasses several major research threads:

Infrastructure in Code (InfCode): An approach in which the operational infrastructure required by an application is formally derived from the application code itself, rather than being specified independently (as in IaC). This relationship is formalized by $I = f(C)$ , with $C$ the source code and $I$ an inferred, machine-readable infrastructure specification. Here, the framework interprets application-level DSLs and annotations, synthesizing all necessary entities—APIs, storage, functions, permissions—directly from code, thereby ensuring consistency and eliminating manual duplication (Tankov et al., 2021).
InfCode for Autonomous Issue Resolution: An adversarial, multi-agent system for automated software repair, organized as an iterative cycle between test generation and patch synthesis agents, with selection governed by formal scoring that integrates correctness, coverage, and test stability (Li et al., 20 Nov 2025).
InfCode Logic for Control Codes: An ontology and logical framework for reasoning about control code production, deployment, and use in both mundane and safety-critical embedded contexts, emphasizing informal but systematic inference schemes grounded on empirical testing, trust, and risk analysis (Bergstra, 2010).
InfCode-C++: The first system to introduce intent-guided semantic retrieval and AST-structured search to C++-aware autonomous issue resolution, overcoming the limitations of shallow or Python-centric repair approaches via compositional code intent modeling and deterministic structure queries (Dong et al., 20 Nov 2025).

Each instantiation is motivated by the need to close gaps in reproducibility, reliability, or comprehensiveness that arise when code, infrastructure, and verification artifacts are specified or managed independently.

2. Automated Issue Resolution: Adversarial Multi-Agent Architecture and Metrics

The InfCode framework for issue resolution employs an adversarial, iterative refinement loop involving at least three agents (Li et al., 20 Nov 2025):

Test Patch Generator (T): Generates or strengthens test suites $\Delta T$ tailored to expose defects that may persist after patching. Operates by identifying semantic gaps or insufficient coverage relative to the issue and candidate code base.
Code Patch Generator (C): Synthesizes code patches $P$ that satisfy the current suite of tests $T$ .
Selector Agent: Scores candidate patch–test suite pairs $(P_i, T_i)$ via a composite metric:

$\mathrm{Score}(P_i) = \alpha\cdot\mathrm{Correctness}(P_i,T_i) + \beta\cdot\mathrm{Coverage}(T_i) - \gamma\cdot\mathrm{Instability}(P_i)$

selecting the best candidate for final integration.

The process iterates: when $C$ produces a patch that passes $T$ , $T$ searches for edge cases or semantic holes to refine $T$ further. The loop terminates when $T$ can no longer be strengthened (i.e., when $\Delta T = \emptyset$ ) or after a maximum number of adversarial rounds.

Key performance metrics include the "solved rate" (fraction of issues for which the final patch passes all tests), API cost per issue, and unique fix rates. InfCode achieved a solved rate of 79.4% on the SWE-bench Verified benchmark—establishing a new state-of-the-art (Li et al., 20 Nov 2025). A plausible implication is that adversarial test–patch co-evolution substantially mitigates the risk of overfitting to insufficient tests.

3. Intent-Guided Retrieval and Structured Fault Localization in C++ (InfCode-C++)

InfCode-C++ embodies a specialized instantiation of the InfCode approach for statically typed, structurally complex C++ codebases (Dong et al., 20 Nov 2025):

Repository Parsing: Constructs an AST-based structural index $\mathcal{T}_C = (\mathcal{N}, \mathcal{E})$ (nodes: classes, functions, templates; edges: containment, inheritance, calls) and a semantic code-intent index $\mathcal{I}_\text{intent}$ , mapping high-level feature intents (e.g., "serialize data") to code artifacts.
Semantic Code-Intent Retrieval: The issue description $D$ is embedded as a vector $q\in\mathbb{R}^d$ and compared (via cosine similarity) to code artifact embeddings to yield a context subset $C_\text{intent}$ .
AST-Structured Querying: Deterministic queries over the AST enable precise localization, resolving overloaded identifiers, deep inheritance, and template instantiations. Example queries: FindClass(name), FindFunction(spec, name), GetInheritanceChain(name), GetFunctionCalls(className, functionName).
Context Construction & Fault Localization: The defect region $L$ is the intersection $L = \mathcal{I}_\text{intent}(D) \cap G_\text{bug}$ (intent-derived and structure-derived candidates).
Patch Synthesis: Multi-section prompts incorporating $D$ , localized AST snippets, and test errors are presented to an LLM with iterative refinement, yielding up to 10 diverse candidate patches per round.
Evaluation: On MultiSWE-bench-CPP (129 issues, 5 repositories), InfCode-C++ with GPT-5 achieved 25.58% end-to-end resolution, 10.85pp higher than the best prior and over double MSWE-agent. Ablations show drops of up to 8.53pp when disabling core components, confirming the criticality of both semantic and structural retrieval.

Behavioral metrics include 28.81% reproduction success, 55.10% file-level localization accuracy, and 42.10% function-level localization accuracy. This suggests that precise context extraction in statically typed settings materially improves automated defect repair (Dong et al., 20 Nov 2025).

4. Infrastructure in Code: Automated Infrastructure Synthesis

The InfCode paradigm for infrastructure automation dispenses with standalone infrastructure DSLs (as found in Infrastructure as Code, IaC), instead deducing complete cloud infrastructure configurations from application code (Tankov et al., 2021). The Kotless framework operationalizes this pipeline:

Parser: A Kotlin compiler plugin extracts annotation-driven, framework-level, or DSL-based declarations for routes, handlers, events, permissions directly from code.
Cloud-Agnostic Schema: Forms an intermediate representation encapsulating dynamic/static routes, lambdas, storage, permissions, and extension points.
Engine & Code Generation: The engine layer invokes Terraform.kt generators to produce HCL code for Terraform, supporting AWS and Azure targets; generator dependencies guarantee correct resource composition.
Deployer: Terraform CLI applies the resulting configuration for provisioning.

Supported scenarios span three DSLs (Kotless DSL, Ktor, Spring Boot) and two runtimes (JVM, GraalVM native image). Quantitative case study: migration of a live Kotlin REPL reduced required infra code from >600 to 989 lines (auto-generated), developer effort from ≈4–6 hours to 15 minutes, and monthly costs by 80%, with unchanged throughput.

Notable limitations include difficulty inferring dynamic NoSQL schemas, partial incompatibility with reflection-heavy frameworks under native compilation, and the necessity of manual management for global infrastructure artifacts (e.g., VPCs, DNS). Extensibility targets new cloud providers, DSLs, runtimes, and richer schema constructs (Tankov et al., 2021).

5. InfCode Logic: Control Code Production, Inference, and Risk Analysis

As developed by Bergstra, the Informal Control Code Logic (ICCL) branch of InfCode formalizes control code reasoning for code-driven devices ("boxes") (Bergstra, 2010):

Core Elements:
- Control codes ( $C$ ): bit-sequences executed by a box $B$ .
- Production: Typically by compilation of instruction sequences into $C$ .
- Distribution/Deployment: Transmitted via media, networks, or updates; deployed by loading $C$ into $B$ .
- Usage and Testing: $B(C)$ denotes execution; usage delivers service $S$ (utility $UT$ ), testing reveals behavioral knowledge, simulation/experimentation facilitates hypothetical reasoning.
Inference Schemes:
- Acceptance Test Rule (ATR): User-side; if a set of informal premises (trusted origins, consistent test results, acceptable risk/utility) are met, inference permits use of $C$ for $S$ .
- Release Test Rule (RTR): Producer-side; formalizes when release, licensing, and recommendation is justified based on specification, verification, structural testing, and risk.
Role of Testing, Trust, and Confidence:

Since analytic models of $B(C)$ often lack completeness, empirical testing is central; but trust in producers/testers and confidence in generalization to untested scenarios are equally necessary. Risk is categorized by potential for operational, asset, or competitive loss and is engaged through scenario-specific assessment.

Semi-Formal Models:
- Synthetic vs. Analytic Architectures: Distinguishing real (black-box) device execution from modeled (disassembler plus operational rules) behavior; equivalence conditions form a basis for analytic validation, though empirical discrepancies persist.
- Intrinsic Universality: Generalized analytic models enable reasoning about Turing-completeness.
- Safety-Critical Predicate: Determines safety-criticality via the existence of code variants which do or do not cause harm in a robotic system context.
Significance:

ICCL underscores that empirical embedded testing remains essential for confidence in safety-critical domains, and blanket replacement with formal verification is not viable in practice. A plausible implication is that risk management and trust inference are as foundational as technical analysis for high-stakes software deployment (Bergstra, 2010).

6. Comparative Impact, Evaluation, and Limitations

The InfCode frameworks have demonstrated quantifiable advances across multiple software engineering subdomains:

Autonomous Bug Repair: InfCode's adversarial loop and LLM-based strategies outperform pipeline- and agent-based baselines, with improvements in both solved rates and coverage of unique bug classes (Li et al., 20 Nov 2025).
Infrastructure Synthesis: Application-driven infrastructure reduces manual effort and operational costs, at the cost of some expressiveness for dynamic or global artifacts (Tankov et al., 2021).
Control Code Reasoning: The ICCL framework fills a conceptual gap in connecting empirical behavioral testing with informal logic for risk, trust, and deployment—especially relevant in contexts lacking analytic models (Bergstra, 2010).

Identified limitations include specialized tool failures (e.g., editor and bash command errors in repair pipelines), potential over-specialization or semantic drift in generated tests, and challenges in extending infrastructure inference to arbitrarily complex, runtime-computed artifacts. Major avenues for development comprise improved test generation fidelity, toolscript resilience, extension to more language/ecosystem targets, and richer schema capability.

InfCode Variant	Domain	Core Mechanism	Benchmark / Case Study	Solved/Success Rate
InfCode Adversarial	Issue resolution	Test–patch adversarial co-evolution	SWE-bench Verified	79.4%
InfCode-C++	C++ patch synthesis	Intent retrieval + AST-structured search	MultiSWE-bench-CPP	25.58%
Kotless (InfCode)	Cloud infrastructure	Code-based infra inference (Kotlin)	Play KotlinLang migrate	80% cost reduction

These results support the efficacy of integrating multi-modal reasoning—combining semantic, structural, empirical, and risk-aware inference—for software reliability and infrastructure management at scale.