Critique Agent: Modular Feedback System
- Critique Agent is a computational entity that uses modular decomposition and multi-agent dialogue to generate, evaluate, and refine feedback in complex domains.
- It employs specialized sub-agents like leader agents, critics, and refiners to systematically identify errors and enhance review specificity in domain-specific tasks.
- Structured protocols, adversarial feedback, and iterative refinement in these systems improve performance in scientific reviews, code analysis, and creative ideation.
A Critique Agent is a computational entity—typically instantiated as a LLM or a specialized multi-agent system—designed to generate, evaluate, or refine feedback on the outputs, reasoning, or proposals of other agents or humans in complex, open-ended domains such as scientific peer review, code analysis, reinforcement learning, or creative ideation. Contemporary research demonstrates that Critique Agents play an essential role in enhancing the specificity, helpfulness, diversity, and safety of autonomous workflows by implementing structured, role-specialized critique generation, adversarial feedback, and multi-agent validation protocols.
1. Architectures and Modular Decomposition
Recent Critique Agent systems universally employ agent modularization, with specialized sub-agents tailored for distinct dimensions of the critique process. The MARG architecture, for example, partitions the review workflow into a leader agent, distributed worker agents (each handling a text chunk for full-document coverage), and expert agents focusing independently on experiments, clarity, and impact. Refinement agents are then tasked with pruning, consolidating, and rephrasing raw feedback. The inter-agent protocol is leader-driven, featuring explicit broadcast and aggregation cycles, enabling complex collaborative reasoning beyond monolithic LLM reviewers (D'arcy et al., 2024).
Table-Critic separates judgment, criticism, refining, and pattern curation into distinct agents. The Critic agent in this workflow identifies the first erroneous reasoning step using a self-evolving template tree, produces targeted natural-language critiques, and guides the Refiner towards iterative correction. This design achieves substantial net error correction with minimal solution degradation (Yu et al., 17 Feb 2025).
Peer review and proposal assessment frameworks like ScholarPeer and AstroReview extend this modularity with sub-agents for contextualization (e.g., summary, historian, baseline scout), verification (questioning and claim checking), iterative group review (multi-reviewer and meta-review), and reliability auditing, yielding human-parity or superior review quality on domain benchmarks (Goyal et al., 30 Jan 2026, Wang et al., 31 Dec 2025).
2. Multi-Agent Feedback, Specialization, and Aggregation
Multi-agent critique, wherein multiple independently parameterized LLMs or role-tuned models contribute feedback, underpins advances in critique specificity and quality. The MultiCritique pipeline aggregates Analytical Critique Units (ACUs) from diverse LLMs such as GPT-4, Claude, Qwen, and InternLM, with a powerful meta-critic (GPT-4) classifying, merging, or repairing critiques based on error taxonomy and severity. This data pipeline supports both SFT and RL stages, dramatically surpassing single-agent approaches for both subjective and objective feedback alignment and task performance (Lan et al., 2024).
Within ideation or design refinement loops, increasing the number of parallel critics and diversifying their persona or disciplinary focus each independently boost the novelty and feasibility of generated outputs; three parallel, persona-injected critics are empirically optimal for balancing quality and diversity, with marginal returns diminishing beyond that cohort size (Ueda et al., 11 Jul 2025).
For code review, the RevAgent framework delegates to five category-specialist commentator agents (covering refactoring, bugfix, testing, logging, documentation), with a discriminative critic agent reranking and selecting the most semantically precise, issue-oriented review comment. This structure resolves pathological blending and class bias found in monolithic LLMs, providing substantial and statistically significant gains across BLEU, ROUGE-L, METEOR, and SBERT metrics (Li et al., 1 Nov 2025).
3. Protocols for Critique Generation, Selection, and Validation
Standard critique generation protocols involve a distributed-process: worker agents partition texts (to bypass context-length limits); specialists generate targeted feedback on assigned axes; and a leader agent or central critic aggregates, deduplicates, and refines responses. In reinforcement learning and open-ended agent learning, critique-guided improvement (CGI) and ECHO (Evolving Critic for Hindsight-Guided Optimization) formalize the interaction as a two-player or dual-track optimization, with the critic generating actionable, graded feedback on partial trajectories or actions and the actor refining its strategy conditionally. The ECHO framework advances this by co-evolving actor and critic with saturating reward shaping and group-structured advantage estimation, ensuring adaptation to non-stationary error patterns and long-horizon task progress (Li et al., 11 Jan 2026, Yang et al., 20 Mar 2025).
A key innovation is the construction of critique dialog protocols, such as DIAGPaper's Rebuttal module, where criterion-specific reviewers produce targeted weaknesses and an author agent engages in bounded, multi-round debate over each, validating, refining, or discarding weaknesses based on evidence and expert criteria. Surviving weaknesses are then prioritized for severity using learned impact scores, evidence, and validity signals (Zou et al., 12 Jan 2026).
4. Evaluation, Metrics, and Empirical Outcomes
Critique Agent systems are evaluated with a spectrum of both automatic and manual metrics, with rigor attuned to target application:
- For scientific review, generic-comment rate and good-comment rate quantify feedback specificity and value, with MARG reducing the generic-comment rate from 60% (GPT-4 baseline) to 29% and more than doubling good comments per paper (D'arcy et al., 2024).
- On critique alignment, F_{sub} (subjective feedback score, rated by GPT-4 or equivalent) and F1 on binary flaw-detection serve as gold standards. MultiCritique-trained models outperform all 7B–13B open-source baselines by substantial margins and approach advanced proprietary models (Lan et al., 2024).
- In multi-agent table reasoning, the introduced error correction rate (Δ{i→c}), solution degradation rate (Δ{c→i}), and net gain are the principal outcomes. Table-Critic achieves +9.6% correction on WikiTQ and highly efficient convergence within 3–6 critique–refiner iterations (Yu et al., 17 Feb 2025).
- For ideation, Non-Duplicate Ratio and LLM-based Precision@N measure novelty and feasibility, evidencing that agent parallelism and depth systematically improve creative exploration (Ueda et al., 11 Jul 2025).
- In adversarial judge settings, WAFER-QA reports Acc@{R}, S{rec}@_{R}, and Acknowledgment Rate, demonstrating that most generator models remain highly vulnerable to well-crafted, factually supported adversarial critique, with accuracy drops exceeding 50 points for even the strongest agents under grounded malicious feedback (Ming et al., 3 Jun 2025).
- For code review, lexical and embedding-based scores (BLEU, ROUGE-L, METEOR, SBERT), as well as per-class prediction accuracy, are essential for capturing comment utility and issue identification (Li et al., 1 Nov 2025).
5. Vulnerabilities, Failure Modes, and Robustness
Despite their efficacy, critique-based workflows exhibit multiple vulnerabilities. Critique agents (“judges”) can hallucinate evidence, exhibit bias, or act adversarially—undermining the stability of feedback-driven refinement. A two-dimensional taxonomy along intent (constructive, hypercritical, malicious) and knowledge base (none, parametric, grounded) reveals that agents are acutely sensitive to malicious, grounded critique, often reversing correct predictions after a single misleading feedback round, with oscillatory instability persisting over multiple rounds for non-reasoning models (Ming et al., 3 Jun 2025).
Failure mode taxonomies developed for regulated domains (e.g., commercial insurance) reveal that adversarial self-critique significantly reduces hallucination rates (from 11.3% to 3.8%) and increases decision accuracy (+4% absolute), but some systemic vulnerabilities persist, particularly on edge cases, false alarms, and integration errors (Roy et al., 21 Jan 2026).
6. Design Recommendations and Future Directions
Empirical studies and ablation analyses motivate concrete design strategies for Critique Agents:
- Decompose the critique process into role-specialized, parallel expert agents, with a central or meta-critic for aggregation and severity ranking (D'arcy et al., 2024, Lan et al., 2024, Li et al., 1 Nov 2025).
- Use structured, categorical critique prompts and multi-agent feedback aggregation for improved coverage and actionable feedback (Ueda et al., 11 Jul 2025).
- Jointly fine-tune generators and critics or employ ensemble/meta-critic approaches to resist adversarial and hallucinated feedback (Ming et al., 3 Jun 2025, Lan et al., 2024).
- Integrate debate or rebuttal modules with bounded multi-turn exchange to validate the validity and evidence of weaknesses before surfacing them to users (Zou et al., 12 Jan 2026).
- Employ real-time or co-evolving critic–actor loops (e.g., ECHO) to synchronize feedback with rapidly adapting policy distributions in reinforcement learning (Li et al., 11 Jan 2026).
- In regulated or safety-critical domains, enforce bounded inference (single critique–revise steps), explicit flagging, and mandatory human authorization for final decisions (Roy et al., 21 Jan 2026).
By adhering to these principles and leveraging explicit modularization, multi-agent dialogue, and adversarial as well as consensus protocols, Critique Agents deliver demonstrably more specific, diverse, and reliable review and refinement in both human-in-the-loop and autonomous agentic workflows. The multifaceted design patterns and robust empirical gains point toward a convergence in best practices centered around modular specialization, aggregation, debiasing, and human oversight.