SecureFixAgent: Secure Code Repair Agent

Updated 13 March 2026

SecureFixAgent is a class of LLM-driven software agents that automate vulnerability detection and repair with principled security guarantees through isolation and validation techniques.
It employs explicit hierarchical memory isolation, schema-driven mediation, and dynamic exploit-driven patch validation to prevent injection attacks and functionally-correct yet vulnerable patches.
Empirical results show significant reductions in attack success rates and high patch accuracy across benchmarks, enhancing developer trust in automated security workflows.

SecureFixAgent is a class of LLM-driven software agents engineered for automated vulnerability detection and repair with principled security guarantees. SecureFixAgent achieves robust resistance to indirect prompt injection, context poisoning, and Functionally Correct yet Vulnerable (FCV) code generation via an overview of explicit hierarchical memory isolation, schema-driven mediation of untrusted tool outputs, dynamic exploit-driven patch validation, and layered program analysis defenses. SecureFixAgent architectures extend the security primitives of AgentSys to code repair, yielding demonstrable mitigations against a wide array of attack vectors while maintaining or improving on baseline utility compared to both monolithic LLM agents and rigid sandboxes (Wen et al., 7 Feb 2026, Gajjar et al., 18 Sep 2025).

1. Explicit Hierarchical Memory Isolation

SecureFixAgent organizes its agentic structure according to a formal memory isolation hierarchy reminiscent of operating systems process isolation (Wen et al., 7 Feb 2026). The core abstraction is the strict segregation of working memory ( $M_{main}$ for the main agent, $M_{worker}^i$ for each spawned worker, and $M_{nested}^{i,j}$ for recursive subtasks), enforced with the invariant:

$M_{main} \cap M_{worker}^i = \emptyset,\quad M_{worker}^i \cap M_{nested}^{i,j} = \emptyset$

As a result, external tool outputs, user content, and sub-agent reasoning traces never flow raw across context boundaries. Each tool invocation results in a worker agent operating solely over the tool output, an a priori intent schema, and relevant call metadata; only structured, schema-validated summaries may cross upward to $M_{main}$ . This eliminates persistence of adversarial payloads and precludes multi-round attack amplification.

Memory Isolation Components:

Main Agent: Maintains the global task state and orchestrates workflow, declaring intent schemas before observing tool output.
Worker/Nested Agents: Isolated per tool/subtask, holding
- raw tool output $y$ ,
- intent schema $I$ ,
- compact trace stack.
Boundary Crossing Protocols: Only JSON-validated, schema-conforming return objects can be written to the parent context.

Pseudocode Excerpt:

$M_{nested}^{i,j}$ 7 Here, extractStructured may invoke further nested workers, subject to recursive validator checks.

2. Schema Validation and Sanitization Mechanisms

All data entering through tool outputs or sub-agent returns must pass through a deterministic two-stage filtration:

Syntactic Gate: Only parseable JSON objects are admitted.
Schema Validation: The return is validated against the pre-declared grammar

$G:\quad S \rightarrow \{\;F_1: T_1,\,F_2: T_2,\dots\} \qquad T \in \{\texttt{string},\;\texttt{number},\;\texttt{bool},\;\texttt{List}[T],\;\{\dots\}\}$

Failing objects (missing/incorrect type fields) are rejected outright.

Sanitization: For command-type tool calls denied by the validator, an automated sanitizer $\sigma(y)$ is applied to strip "instruction-like spans" (imperatives, role directives) before retrying extraction under a bounded retry budget $B$ . Exhaustion of this budget results in a non-executable error object.

This mechanism ensures only well-typed, intent-relevant, and non-instructional data cross context boundaries, preventing both instruction laundering and overflow of non-essential content.

3. Threat Model, Security Guarantees, and Attack Surface

The SecureFixAgent threat model assumes an attacker can fully control environmental tool outputs but cannot modify agent code, weights, or infrastructure (Wen et al., 7 Feb 2026, Peng et al., 15 Oct 2025). The adversary's principal goal is persistent injection—embedding malicious instructions that influence future decision rounds.

Security Metrics:

Attack Success Rate: For benchmarked indirect prompt injection,

$M_{worker}^i$ 0

Empirical Results:

Benchmark	Baseline (undefended)	Isolation Only	Full SecureFixAgent (full stack)
AgentDojo	30.66%	2.19%	0.78%
ASB	45.23%	—	4.25%

This demonstrates a $M_{worker}^i$ 1 further reduction over isolation alone and orders-of-magnitude improvement over baseline architectures. Overhead complexity is minimized: classic approaches validate over the entire context ( $M_{worker}^i$ 2 per check), while SecureFixAgent only validates per command action ( $M_{worker}^i$ 3, $M_{worker}^i$ 4).

Complementary Guarantees (Agent-Fence):

Agent-Fence expands the attack model to deep agent trust boundaries (planning, state, retrieval, tool-API, and delegation), mapping classes such as state injection, objective hijacking, tool-use hijack, and authorization confusion. SecureFixAgent's strict trust-boundary enforcement, principal-signed tool invocation, write-once memory ledgers, and continuous auditing further mitigate the highest-probability break classes—denial-of-wallet ( $M_{worker}^i$ 5), authorization confusion ( $M_{worker}^i$ 6), and retrieval poisoning ( $M_{worker}^i$ 7) (Puppala et al., 7 Feb 2026).

4. Automated Repair Pipeline and Dynamic Validation

In the context of code vulnerability repair, SecureFixAgent tightly integrates static and dynamic oracles to guarantee both functional and security integrity:

Initialization: The agent ingests a sanitized bug report, augmented call graph, and available exploit input.
Context Retrieval: Progressive code search and retrieval isolate the minimal fix locus via AST traversal and type introspection.
Patch Proposal: The LLM generates candidate patches, guided by in-scope types and developer constraints.
Compilation and Dynamic Test Loop: Candidates are compiled and executed against the triggering exploit. Only patches that both compile and eliminate the exploit-induced crash are returned as "plausible".

Dynamic validation is paramount: similarity metrics (e.g., CodeBLEU) show no correlation with exploit-based correctness (Zhang et al., 2024); only runtime exploit pass/fail discriminates true fixes.

5. Proactive Risk Modeling and Contextual Security Triage

SecureFixAgent incorporates a proactive risk estimation pipeline leveraging empirical insights from large-scale agentic patch security studies (Sajadi et al., 30 Jun 2025). Observed risk amplifiers include:

Number of files modified
Lines of code generated
Absence of code snippets or reproduction steps in the issue

A logistic model estimates patch-level risk:

$M_{worker}^i$ 8

Patches above risk threshold $M_{worker}^i$ 9 are routed to deeper static analysis (e.g., Bandit, Semgrep, CodeQL) and optional refinement. This targeted triage reduces review cost and surfaces high-risk edits before deployment.

High-level triage algorithm:

$M_{nested}^{i,j}$ 8

6. Defenses Against Functionally-Correct yet Vulnerable Patches (FCVs)

A documented and highly impactful threat is the generation of FCV patches—changes that pass available functional test suites but are vulnerable to standard security checks (e.g., CWE classes) (Peng et al., 15 Oct 2025). The FCV-Attack requires only black-box, single-query access via issue submission.

Key mathematical property:

Given a patch $M_{nested}^{i,j}$ 0,

$M_{nested}^{i,j}$ 1

i.e., functional correctness and vulnerability coexist.

Defense architecture:

Dual-oracle Validation: Accept only patches that satisfy both a) $M_{nested}^{i,j}$ 2: all functional tests pass, b) $M_{nested}^{i,j}$ 3: no security check triggers.
Integrated Stack: Patch generation $M_{nested}^{i,j}$ 4 static analysis $M_{nested}^{i,j}$ 5 LLM vulnerability judgment $M_{nested}^{i,j}$ 6 optional formal/SAT-based validation. Guardrails at each layer (regexes, AST linters, principal-bound APIs) provide systematic defense in depth.

Example defense loop:

$M_{nested}^{i,j}$ 9 Audit and logging at every pass enable re-training and forensic evaluation of emerging attack variants.

7. Empirical Performance and Developer Trust

Extensive experiments on Python, C/C++, and smart contract benchmarks indicate the following performance landscape:

Repair accuracy: SecureFixAgent attains 87.8% fix accuracy (Python) and 52.7% plausible patch success (C/C++) (Gajjar et al., 18 Sep 2025, Zhang et al., 2024).
False positive reduction: Up to 57% reduction vs. Bandit-alone baseline.
Iterative convergence: 3 iterations suffice to repair nearly 90% of vulnerabilities post-fine-tuning.
Developer confidence: Qualitative user studies rate explanation quality and Bandit-confirmed security at 4.5/5, highlighting both automated coverage and trustworthiness (Gajjar et al., 18 Sep 2025).

SecureFixAgent maintains high task utility (benign run accuracy 64%–71%) even as attack success rates drop by up to two orders of magnitude. In multi-agent frameworks, explicit modularization of detection, plan, synthesis, and validation further increases robustness relative to monolithic or purely retrieval-augmented LLMs (Karanjai et al., 22 Feb 2025).

By integrating hierarchical context isolation, schema validation, two-stage security triage, dynamic exploit validation, and dual-oracle correctness/security gating, SecureFixAgent constitutes a state-of-the-art platform for secure, dynamic, and trustworthy LLM-based vulnerability repair. Its design directly addresses the new attack surfaces exposed by agentic reasoning and persistent context workflows, offering a blueprint for future high-assurance AI-powered software automation (Wen et al., 7 Feb 2026, Peng et al., 15 Oct 2025, Zhang et al., 2024, Sajadi et al., 30 Jun 2025, Gajjar et al., 18 Sep 2025, Karanjai et al., 22 Feb 2025, Puppala et al., 7 Feb 2026).