- The paper introduces a novel closed-loop agentic framework that uses multi-turn tool invocation and reinforcement learning to synthesize realistic industrial anomalies.
- The method achieves superior semantic consistency and localization performance, with quantitative improvements in IS, IC-L, pixel AP, and image AP over state-of-the-art baselines.
- The framework adopts a two-stage training process, combining supervised fine-tuning with group relative policy optimization to enable iterative defect refinement and diverse anomaly generation.
Motivation and Existing Limitations
Industrial anomaly synthesis is fundamental for mitigating the scarcity and diversity issues in anomaly detection datasets, which directly impact the generalization of IAD models. Traditional approaches fall into few-shot and zero-shot paradigms, with few-shot methods relying on distribution modeling from limited defect data and zero-shot approaches using heuristic perturbations or generation priors. However, both paradigms primarily employ open-loop, single-step generation pipelines that lack iterative reasoning and tool-based refinement, often resulting in structural and semantic inconsistencies in synthesized anomalies.
Figure 1: Motivation of AnomalyAgent—closed-loop agentic framework addresses lack of semantic realism in open-loop few-shot and zero-shot pipelines.
Agentic Framework: Architecture and Closed-Loop Optimization
AnomalyAgent reframes industrial anomaly synthesis as a sequential, tool-guided decision process leveraging MLLMs and structured tool calls. The agent operates in a thought–action–observation loop, dynamically invoking Prompt Generation (PG), Image Generation (IG), Quality Evaluation (QE), Knowledge Retrieval (KR), and Mask Generation (MG). Tool selection is adaptively determined based on reasoning rather than fixed protocol sequences, enabling iterative refinement and self-reflection.
Upon receiving a normal image, object category, and anomaly type, the agent initially generates a defect prompt (PG), edits the image accordingly (IG), evaluates the result (QE), and decides if further refinement is needed, potentially invoking KR for domain guidance. The process proceeds until quality criteria are satisfied or a maximum iteration threshold is met, finalizing with mask generation (MG).
Figure 2: AnomalyAgent overview—iterative tool invocation and adaptive tool selection via agentic reasoning.
Trajectory Construction and Hierarchical Taxonomy
To facilitate robust SFT, the authors design multi-turn trajectories from real anomaly images using reverse synthesis, constructing staged complexity from single- to triple-generation paths. Each trajectory encapsulates PG, IG, QE, optional KR, and MG calls within structured formats, explicitly modeling the reasoning process and tool feedback handling by the agent. This trajectory taxonomy provides curriculum-like staged supervision, scaffolding the RL phase which leverages these trajectories for further optimization.
Figure 3: Multi-turn trajectory pipeline—hierarchical taxonomy by IG call count enables progressive task difficulty.
Two-Stage Training: Supervised Fine-Tuning and Agentic Reinforcement Learning
AnomalyAgent employs cold-start SFT to ingrain tool-use semantics and trajectory adherence. Subsequently it applies group relative policy optimization (GRPO), a PPO variant that employs group-wise reward normalization, to optimize long-horizon strategies and self-reflection. The composite reward mechanism encompasses task reward for anomaly realism and localization, reflection reward for progressive improvement across synthesis steps, and behavioral reward regularizing tool invocation, output format, and trajectory efficiency.
Training curves demonstrate rapid convergence during SFT, followed by steady reward improvement in RL, confirming the effectiveness of the staged paradigm.
Figure 4: Training dynamics—SFT loss convergence and RL reward progression.
Quantitative and Qualitative Evaluation
AnomalyAgent yields superior IS (2.10) and IC-L (0.33) scores for anomaly generation on MVTec-AD, outperforming baselines including Gemini-img, GPT-img, Grok-img, AnoHybrid, and AnoStyler. Classification accuracy (57.0%) and pixel-level AP (74.2%) and image-level AP (99.3%) with downstream ResNet34 and UNet models are consistently higher than all zero-shot SOTA methods. The framework produces anomalies with enhanced semantic consistency and defect localization; qualitative comparisons illustrate marked improvements over single-step and prompt-only methods.
Figure 5: Visual comparison—AnomalyAgent achieves superior semantic consistency and anomaly localization across generative baselines.
Case Studies: Iterative Refined Synthesis
Case studies demonstrate the agent's ability to produce satisfactory defects with single IG calls, handle low-quality initial generations through prompt refinement based on QE feedback, and further boost realism and localization using both KR and QE feedback iteratively.
Figure 6: Case Study 1—high-quality anomaly with single IG call.
Figure 7: Case Study 2—prompt refinement via QE yields improved anomaly and mask.
Figure 8: Case Study 3—joint prompt refinement using KR and QE produces optimal synthesis.
Defect Diversity and Mask Generation
AnomalyAgent successfully generates diverse defect types with corresponding masks on multiple categories in MVTec-AD, enabling comprehensive evaluation and precise downstream anomaly localization.
Figure 9: Generated anomalies and masks for various defect types.
Component and Reward Ablation
Ablation studies validate the cumulative benefit of PG, QE, KR, SFT, and RL for both anomaly realism/diversity and downstream accuracy. Task reward, reflection reward, and behavioral reward contribute synergistically, confirming the necessity of each for robust agentic optimization.
Practical and Theoretical Implications
Practically, AnomalyAgent achieves higher defect realism, semantic diversity, localization accuracy, and efficiency-cost trade-offs than all evaluated baselines. Theoretically, the work constitutes a paradigm shift in industrial anomaly synthesis, formalizing it as a closed-loop, agentic RL task with structured tool integration. This approach unlocks complex reasoning, adaptive planning, and iterative improvement capacities, elevating the potential for downstream industrial anomaly detection and segmentation, and generalizing towards broader multimodal synthesis scenarios.
Conclusion
AnomalyAgent introduces the first closed-loop agentic framework for industrial anomaly synthesis, leveraging tool-augmented reasoning and multi-turn RL optimization. The system exceeds previous zero-shot SOTA on both synthesis quality and downstream detection tasks, enabling controllable, semantically consistent, and diverse anomaly generation. This research lays groundwork for extending agentic RL paradigms to more complex multimodal generation challenges and scalable industrial data pipelines.