Papers
Topics
Authors
Recent
Search
2000 character limit reached

RepairAgent: Autonomous Software Repair

Updated 14 January 2026
  • RepairAgent is an autonomous software engineering system that leverages LLMs and specialized agents to plan, synthesize, and validate code repairs.
  • It integrates iterative context gathering, fault localization, and dynamic feedback from test suites and symbolic validators to refine patch generation.
  • Architectural paradigms span single-agent, multi-agent, and hybrid neuro-symbolic approaches, each enhancing robustness and generalizability in automated repair.

A RepairAgent is an autonomous or collaborative software engineering system, typically built around LLMs and supporting toolchains, whose core capability is the planning, synthesis, and validation of program repairs or patch suggestions. These agents leverage multi-step reasoning, dynamically orchestrated tool use, and execution feedback to localize faults, propose fixes, and verify them against one or more oracles such as test suites, formal specifications, or structured critiques. RepairAgent systems contrast with classical program repair pipelines by tightly interleaving action planning, contextual information gathering, and iterative patch refinement—often adopting explicit multi-agent or neuro-symbolic architectures for improved robustness and generalizability.

1. Architectural Paradigms and Agent Decomposition

RepairAgent frameworks can be classified by their internal structure and division of responsibilities among sub-agents:

Notably, RAMP for Ruby exemplifies a lightweight, feedback-centered multi-agent formulation optimized for rapid convergence, while MarsCode Agent and AIR illustrate deep integration of semantic program representations and planner-driven agent orchestration.

2. Repair Loop Dynamics and Feedback Integration

RepairAgents universally implement an iterative reasoning-and-action loop. At each iteration:

  1. The agent (or one of its sub-agents) observes the current artifact state—bug report, code, failed tests, execution traces.
  2. It synthesizes an evidence-informed next action: gather further context, generate tests, propose edits, or validate with oracles.
  3. The system applies or executes the chosen action, collects the outcome, and incorporates it into the evolving prompt or graph state.
  4. Termination occurs when a stopping criterion is met: all tests pass, oracles succeed, or iteration/cost budgets are exhausted.

This dynamic is formalized in various frameworks via finite state machines (FSMs) (Bouzenia et al., 2024), plan-execute graphs (Chen et al., 2024), or recursive tree-of-thoughts search (Luo et al., 25 Nov 2025). Central to efficacy is the real-time injection of execution feedback (test verdicts, error traces, state diffs) and the corresponding realignment of the agent's reasoning (refined hypotheses, self-reflections, patch adjustments), as observed in RAMP's iterative Reflector loop and AdverIntent-Agent’s adversarial feedback cycles (Akbarpour et al., 6 Nov 2025, Ye et al., 19 May 2025).

3. Information Gathering, Fault Localization, and Context Management

Robust repair requires identifying relevant context and precisely localizing the fault:

  • Code retrieval and fault localization: Spectrum-based fault localization (SBFL, Ochiai), code knowledge graphs (CKG), blame heuristics, and RL-guided data provenance tracing are prevalent for narrowing down probable bug regions (Chen et al., 2024, Kaliutau, 9 Dec 2025, Shi et al., 2 Nov 2025). SBFL algorithms compute suspiciousness scores—for example, Ochiai’s formula for elements ee:

suspiciousness(e)=failed(e)TotalFailed×(failed(e)+passed(e))\mathrm{suspiciousness}(e) = \frac{\mathit{failed}(e)}{\sqrt{\mathit{TotalFailed} \times (\mathit{failed}(e)+\mathit{passed}(e))}}

  • Prompt construction and context assembly: Prompts typically combine non-historical metadata (bug description, failing tests, localization lines) and history-derived context (blame diffs, function history) to guide LLM synthesis (Shi et al., 2 Nov 2025).
  • Avoiding the "Semantic Trap": DTG-based representations in AIR enable causal rather than semantically-near retrieval, ensuring that only code with direct data lineage to the buggy state is traversed (Kaliutau, 9 Dec 2025).

Table 1: Core Context Sources Employed by RepairAgents

Method/Agent Contextual Signal Mechanism
RAMP Sample I/O, error traces Self-generated tests, reflection
MarsCode, SemAgent Dynamic traces, SBFL Execution feedback, entity extraction
HAFixAgent Blame, historic diffs Prompt history injection
AIR Data transformation graph RL-guided causal tracing

4. Patch Generation, Review, and Validation Protocols

Patch synthesis in RepairAgent systems is informed by the available context and utilizes targeted LLM prompt strategies:

Formally, patch scoring may be expressed as:

Score(p)=w1â‹…plausibility(p)+w2â‹…correctness(p,i)\mathrm{Score}(p) = w_1 \cdot \mathrm{plausibility}(p) + w_2 \cdot \mathrm{correctness}(p, i)

where w1,w2w_1, w_2 prioritize passing original and adversarial tests (Ye et al., 19 May 2025).

5. Evaluation Benchmarks and Performance Metrics

RepairAgent effectiveness is benchmarked via community-recognized datasets:

Table 2: Selected Published Solve Rates (pass@1)

Agent/System Benchmark Solve Rate
RepairAgent Defects4J 164/835 (19.6%)
MarsCode Agent SWE-bench Lite (Python) 34.0%
RAMP XCodeEval (Ruby, subset) 67.0%
AIR SWE-Verified 87.1%

Agentic systems often outperform prompt-only and single-agent baselines, with multi-agent partitioning, history-aware context, and explicit feedback loops being major contributors to increased success rates.

6. Insights, Limitations, and Domain Generality

Behavioral analyses highlight several recurring themes in RepairAgent research:

7. Open Challenges and Future Directions

Strategic priorities for future RepairAgent research include:

  • Improving test and oracle reliability: High false-negative rates in self-generated tests slow convergence but can be mitigated by more precise test generation, self-critique, or hybrid oracles (Akbarpour et al., 6 Nov 2025).
  • Scaling to complex project structures: Extensions beyond single-file tasks require cross-file harnesses, architectural reasoning, and potentially more advanced retrieval or RL-based navigation (Kaliutau, 9 Dec 2025, Shi et al., 2 Nov 2025).
  • Semantic and symbolic integration: Causal context management (DTGs), agentic review, and neuro-symbolic loops are emerging routes for enhanced generality and trustworthiness (Kaliutau, 9 Dec 2025, Maddila et al., 24 Jul 2025).
  • Real-world deployment: Realistic production deployments demand cost-aware iteration control, model-agnostic toolsets, robust reviewer integration, and human-in-the-loop patch vetting (Maddila et al., 24 Jul 2025, Kaliutau, 9 Dec 2025).
  • Agent-system self-repair: Repairing LLM-agent systems themselves is notably harder due to volatile external resources, semantic complexity, and nondeterminism of LLM outputs—current resolution rates are markedly lower than for traditional APR (Rahardja et al., 27 May 2025).

In summary, RepairAgent frameworks operationalize automated debugging and patching as a sequence (or coordination) of reasoning, action, and validation steps, grounded in dynamic feedback and refined by explicit agentic structures. These systems demonstrate accelerating effectiveness on challenging software repair tasks and are at the core of next-generation autonomous software maintenance pipelines (Bouzenia et al., 2024, Chen et al., 2024, Kaliutau, 9 Dec 2025, Akbarpour et al., 6 Nov 2025).

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to RepairAgent.