Multi-Agent Critique–Repair Loops

Updated 28 May 2026

Multi-agent critique–repair loops are architectures that partition roles into planning and validation agents to iteratively detect errors and execute localized repairs.
They employ formal invariant checks, bounded log analysis, and policy-driven repair protocols to minimize error propagation and reduce work loss.
Empirical evaluations demonstrate improved robustness, efficiency, and fault containment in domains such as code synthesis, data curation, and workflow planning.

Multi-agent critique–repair loops are a class of closed-loop architectures in which multiple specialized agents—typically LLMs or tool-augmented role-specialists—interact to iteratively detect faults (“critique”), assess proposals against objective or procedural criteria, and then orchestrate targeted repair actions. This formalism provides a principled substrate for robust, auditable, and efficient planning, reasoning, code synthesis, or data processing in settings where isolated monolithic models exhibit fragility, circular verification, or poor fault containment. Such loops are implemented in diverse domains, from LLM-driven workflow planning and auto-ML, to repository-level program repair, data curation, and reasoning or summarization tasks, and are empirically shown to improve both correctness and efficiency relative to single-agent or global recompute baselines.

1. Core Architecture and Formal Definition

Multi-agent critique–repair loops partition roles into at least two decoupled agent types: planning/proposal agents that produce candidate actions or outputs, and validation/critique agents that independently assess, flag, and catalyze reparative interventions. The canonical interaction in ALAS (“Transactional and Dynamic Multi-Agent LLM Planning”) (Geng et al., 5 Nov 2025) is formalized as follows:

Let {P₁,…,Pₙ} be planning agents, each mapping state s and bounded input Iᵢ to action space Aᵢ:

$P_i: S \times I_i \to A_i$

Let V be the validator, reading a bounded slice $L_V$ of the versioned execution log $L$ :

$V: L_V \rightarrow \{\,\textsf{pass},\,\textsf{fail}\}$

The execution proceeds in phased stages:

Planning: Each $P_i$ proposes and logs its action.
Validation: $V$ inspects the recent log and issues a verdict.
Repair: If $V$ fails, localized repair is triggered on the minimal affected history region; otherwise, the state is committed.

This division ensures non-circularity—the critiquing agent never operates on the same context or prompt as the original proposer, avoiding self-approval fallacies. The interaction is mediated by a versioned log of state/action trajectories, which underpins both replayable validation and localized rollback/replay for repair.

2. Design of Validation and Critique Mechanisms

Validation/critique agents perform grounded checks on system state transitions, outputs, or code artifacts. In ALAS, the validator applies invariant conditions and pre/post-checks over a bounded $K$ -window of log entries (e.g., job precedence, no-overlap constraints):

$\phi(s_{t-1},\,a_t) = \left(\text{pre}(s_{t-1}) \wedge \text{post}(s_t) \wedge \text{invariant}(s_{t-1}, s_t)\right)$

Repair is then triggered if any $\phi$ fails. In SelfHeal (Islam et al., 20 Apr 2026), a two-agent ReAct architecture is used wherein a Fix Agent proposes patches, and a Critic Agent applies a weighted score:

$L_V$ 0

with $L_V$ 1 required for loop termination. Critique intermediaries can draw on both internal fix rules and external context (e.g., web search or tool validation).

In table reasoning or generation settings (Table-Critic (Yu et al., 17 Feb 2025), MAMM-Refine (Wan et al., 19 Mar 2025)), critiquing agents analyze reasoning chains or model outputs, identify fault localization, and serve as the basis for targeted refinement via auxiliary refiner agents, often supported with formally structured pattern libraries or self-evolving templates.

3. Localized and Policy-Governed Repair Protocols

Multi-agent critique–repair frameworks emphasize localized interventions, editing only the minimal “blast radius” affected by detected faults. In ALAS, this is formalized as a minimization over the log entry edit radius $L_V$ 2:

$L_V$ 3

The repair operation is governed by an explicit policy $L_V$ 4 encapsulating retrial, compensation, backoff, timeout, idempotency, and loop guards:

$L_V$ 5

This policy-driven approach extends across application domains—ranging from workflow engines (Amazon States Language, Argo) to autonomous agents (Meta-Agent (Xu et al., 24 May 2026)) and code repair pipelines—supporting both transactional integrity and computational efficiency.

Repair algorithms typically proceed by:

Identifying the earliest failure,
Determining the smallest neighborhood that encloses all impacted actions,
Applying minimal edits under explicit constraints,
Selectively replaying unaffected segments,
Validating and either committing or escalating to global recompute if local repair fails.

4. Empirical Efficiency and Robustness

Multi-agent critique–repair loops are empirically shown to deliver robust improvements in efficiency, success rate, and scalability. On job-shop scheduling benchmarks (ALAS (Geng et al., 5 Nov 2025)), localized repair matches or beats baseline planners at 83.7% success, reduces token usage by 60%, and achieves 1.82× end-to-end speedup over recompute. In SelfHeal (Islam et al., 20 Apr 2026), dual-agent patching achieves a 59.5% repair rate versus 35.1–37.8% for single-agent or zero-shot baselines, with a 10% reduction in spurious acceptances due to the critic loop.

In extensive reasoning and data synthesis tasks (EigenData (Chen et al., 5 Mar 2026), Table-Critic (Yu et al., 17 Feb 2025)), multi-agent critique–repair architectures outperform strong single-agent and monolithic competitive approaches, yielding net accuracy improvements, lower degradation, and more consistent convergence.

Quantitatively, these effects are attributed to:

Decoupling of validation prevents context attrition and self-approval errors.
Bounded, policy-governed repair avoids the costly work loss characteristic of global recompute.
Auditability and versioning facilitate safe rollback and replay.

5. Extensions Across Domains

Critique–repair loop formalism has been generalized and adapted across a wide spectrum of multi-agent system domains:

Graph-theoretic multi-agent LLM networks encode critique–repair as spectral graph dynamics, formalizing convergence/oscillation in signed Laplacian cycles and enabling polynomial-time deadlock repair via rank-one spectral perturbation and chordal decomposition (Javed, 23 Feb 2026).
Self-evolving data platforms (EigenData (Chen et al., 5 Mar 2026)) coordinate specialized sub-agents in schema/coding/trajectory audit and targeted repair, with structured cross-component feedbacks, self-optimizing prompt refinement, and outcome-aware evaluation.
Automated program repair and dynamic evidence integration (TraceRepair (Wu et al., 3 Apr 2026), SGAgent (Zhang et al., 27 Feb 2026), LANTERN (Luo et al., 28 Mar 2025)) employ committee-based agent debate, decision-guided language hopping, and analyzer-driven repair, leveraging both dynamic traces and cross-language translation.
Long-form generation and summarization (MAMM-Refine (Wan et al., 19 Mar 2025)) demonstrate gains through multi-agent, multi-model reranking and reranking-based critique/repair, with systematic ablation showing that task-specific agent diversity enhances reliability on both detection and refinement subtasks.

6. Lessons, Theoretical Insights, and Open Frontiers

The effectiveness of multi-agent critique–repair loops is underpinned by several robust principles:

Verification–generation asymmetry: Models are empirically more reliable at detecting errors than avoiding them, justifying strict separation and specialization of roles (as in ActuBench (Schmidt, 22 Apr 2026)).
Role specialization and swapability: Decoupled agents (separate adapters/models) allow upgrades to verification, generation, or repair modules without entangling the entire system.
Fault containment and bounded perturbation: Localized, rule-governed repair restricts the “blast radius”, limits work loss, and prevents brittle escalation cascades.
Scalability and compositionality: These architectures scale to deep/decentralized workflows (Meta-Agent (Xu et al., 24 May 2026)), large data pipelines (EigenData), and broad code repositories (SGAgent).
Algorithmic guarantees: Formal results on convergence in critique cycles, spectral stability, and minimal edit distance under versioned execution logs serve as foundational correctness and efficiency guarantees.

Theoretical and empirical literature converges on the conclusion that critique–repair loop design is a necessary and load-bearing element in attaining reliability, interpretability, and cost-efficiency for multi-agent LLM systems, with sharp ablations consistently showing significant drops in accuracy, robustness, and repair containment in the absence of explicit decoupled critique, versioned logging, or bounded repair.

Advancing this approach presents open directions in: integrating more dynamic execution feedback, generalizing cross-agent communication to richer interaction topologies, extending to multi-level or hierarchical critique (e.g., three-level error attribution [Meta-Agent]), and bridging between automatic and human-informed verification regimes.