AutoIAD: Automated Visual Anomaly Detection

Updated 5 April 2026

AutoIAD is a multi-agent framework that automates the complete development of industrial visual anomaly detection systems using a central Manager Agent and specialized sub-agents.
It employs a review and refine mechanism where the Manager Agent iterates over sub-tasks, ensuring robust error correction and improved performance on benchmarks like MVTec AD.
The framework integrates a domain-specific knowledge base to guide data preparation, model design, and training, significantly outperforming traditional AutoML approaches.

AutoIAD is a multi-agent collaboration framework designed to automate the end-to-end development of industrial visual anomaly detection (IAD) systems. It orchestrates domain-specialized sub-agents using a central Manager Agent and integrates a structured domain-specific knowledge base to guide decision making and pipeline execution. AutoIAD demonstrates substantial improvements over both traditional AutoML and general agentic frameworks in industrial anomaly detection tasks, as validated on the MVTec AD benchmark using various LLM backends (Ji et al., 7 Aug 2025).

1. Manager Agent and Pipeline Orchestration

The central component of AutoIAD is the Manager Agent $A_{\rm mgr}$ , which parses a user-specified TaskCard $T$ , decomposes it into granular sub-tasks, and schedules four domain-specialized sub-agents in sequence: Data Preparation ( $A_p$ ), Data Loader ( $A_d$ ), Model Designer ( $A_m$ ), and Trainer ( $A_t$ ). The Manager supervises execution by reviewing deliverables and issuing corrective feedback.

The orchestration is governed by a scheduling function: $(A,\,F,\,S)\;\gets\;A_{\rm mgr}(W,T)$ where $W$ is the shared workspace (files, code, artifacts), $F$ is Manager-issued feedback, $S$ is the pipeline state ( $T$ 0 or $T$ 1), and $T$ 2 is the next sub-agent to call. Execution proceeds iteratively: the Manager invokes each agent via a CALL routine, agents perform self-review, and the Manager may request refinement or retry in the event of errors or unsatisfactory results (Algorithm 1, (Ji et al., 7 Aug 2025), Fig. 3 and §3.2). This “review and refine” mechanism is crucial for correcting LLM-induced hallucinations and ensuring rigorous output quality throughout the pipeline.

2. Sub-Agent Structure and Function

Each sub-agent in AutoIAD focuses on a distinct stage of the IAD workflow, contributing modular deliverables to the shared workspace:

Agent	Input Artifacts	Output Artifacts
Data Prep ( $T$ 3)	Raw image folders	dataset.csv
Data Loader ( $T$ 4)	dataset.csv	Dataloader.py, test code
Model Designer ( $T$ 5)	Dataloader.py, TaskCard $T$ 6	Model.py, design rationale
Trainer ( $T$ 7)	Model.py, Dataloader.py	Model checkpoint, log, AUROC

Data Preparation Agent ( $T$ 8) inspects raw dataset structures, derives train/val/test splits, extracts class labels, and generates a canonical dataset.csv.
Data Loader Agent ( $T$ 9) builds a PyTorch dataloader with batch handling and augmentation hooks, consulting the knowledge base for appropriate transforms.
Model Designer Agent ( $A_p$ 0) selects or synthesizes an anomaly detection model (e.g., PatchCore, FastFlow), configures hyperparameters, and outputs tested, documented code.
Trainer Agent ( $A_p$ 1) sets up the training script, manages checkpointing, and conducts hyper-parameter optimization. Training progress is evaluated on image-level AUROC, with prospective retraining instructed by the Manager upon poor validation metrics [(Ji et al., 7 Aug 2025), Fig. 2].

If a sub-agent’s output fails its own or Manager review (e.g., code errors, insufficient AUROC), the corresponding agent is iteratively recalled until the sub-task is satisfactorily completed.

3. Domain-Specific Knowledge Base

AutoIAD’s “Domain Knowledge Module” is a curated, queryable repository containing anomaly detection best practices. It encompasses:

Data augmentations: domain-relevant image transforms (resize, flip, noise, custom methods)
Model templates: reference implementations for autoencoders, patch-embedding models, normalizing flows
Hyper-parameter guidelines: suggested learning rates, regularization, coreset sampling ratios
Training scripts: standard loss and logging routines

Agents access the knowledge base by keyword lookup (e.g., “anomaly_model_templates”) at decision points during execution. This structured foundation grounds the pipeline in proven industrial IAD practices and effectively mitigates LLM hallucination, as established by ablation: removing the knowledge base reduces task success to 60% and test AUROC to 0%, indicating model outputs are functionally ineffective without it (§4.4, Table 2 (Ji et al., 7 Aug 2025)).

4. Loss Functions, Evaluation Metrics, and Optimization

AutoIAD employs standard loss functions contingent upon the model class:

Reconstruction-based: $A_p$ 2
Normalizing-flow-based: $A_p$ 3

Model performance is universally evaluated using image-level AUROC: $A_p$ 4 where TPR and FPR derive from thresholded anomaly scores. The Trainer Agent logs AUROC at each epoch, which the Manager uses to determine convergence or whether retraining is necessary. Hyper-parameter optimization is performed via grid or random search, steered by ranges and heuristics in the knowledge base [(Ji et al., 7 Aug 2025), §3.3–3.4].

5. Benchmark Dataset and Evaluation Protocol

Benchmarking utilizes the MVTec AD dataset, covering 15 tasks with diverse object categories (bottle, metal_nut) and textures (carpet, tile). Protocol specifics:

Data regime: Training on defect-free samples only; test sets include both normal and defective samples, with pixel-wise masks withheld.
Pipeline requirements: Each task must complete all four sub-agent stages within fixed time and token limits; output must include a non-NaN AUROC.
Baselines: Comparisons are conducted with MLAgent-Bench, AutoML-Agent (AutoML approaches), openManus, openHands (generic agentic frameworks), unified by the same Gemini LLM core.

In head-to-head comparison, AutoIAD offers superior pipeline completion and anomaly detection performance (Table 1, (Ji et al., 7 Aug 2025)). Evaluation mandates success across all pipeline stages and anomaly detection efficacy as per AUROC.

6. Comparative Performance and Ablation Analysis

AutoIAD's results on the MVTec AD benchmark:

Framework	Success Rate (%)	Test AUROC (%)
MLAgent-Bench	0	-
AutoML-Agent	0	-
openManus	50	48.09
openHands	73.3	53.88
AutoIAD	88.3	63.69

LLM backbone evaluation demonstrates the importance of underlying model quality:

Gemini-2.5-Flash: 88.3% success, 63.69% AUROC
Qwen-Max: 77.8%, 25.71% AUROC
Claude-3.7: 63.3% (timeout), no AUROC
GPT-4o-Mini: 43.3%, 25.00% AUROC
DeepSeek-v3: 37.8%, 0.0% AUROC
Qwen3-235B: 50.0%, 28.65% AUROC

Class-wise breakdown reveals most object and texture categories achieve 3–4 stage completion and, in select cases (e.g., carpet, tile, metal_nut), AUROC > 80%. Cases with “null” AUROC denote completed pipelines with no meaningful anomaly signal [(Ji et al., 7 Aug 2025), Table 5].

Ablation experiments further demonstrate the core contributions:

Without Manager Agent: Success drops to 83.3% and AUROC to 35.01%. Centralized review is pivotal for error correction.
Without Knowledge Base: Success is 60.0%, AUROC is 0.0%. Domain priors are essential for producing meaningful anomaly detection results (§4.4, Table 2).

7. Significance, Context, and Implications

AutoIAD reconstitutes the conventional industrial anomaly detection workflow—traditionally a manual sequence of data cleaning, augmentation, model/HP selection, and evaluation—into a fully automated, multi-agent pipeline. The Manager Agent functions effectively as a meta-engineer, orchestrating, validating, and iterating each phase toward convergence on an operational anomaly detector. Integration of a structured domain knowledge base enforces grounded, empirically validated practices, crucially reducing the prevalence of LLM-induced hallucination and misconfiguration observed in baseline approaches.

On the MVTec AD benchmark, AutoIAD establishes a new state of the art for automated industrial anomaly detection, achieving both the highest end-to-end pipeline completion (88.3%) and model performance (mean AUROC ≈ 64%) among tested frameworks (Ji et al., 7 Aug 2025). Ablation results underscore the indispensable roles of both centralized managerial supervision and structured domain priors in robust industrial machine learning automation.

This suggests that effective automation of real-world visual anomaly detection workflows is contingent on both agentic oversight and domain-grounded guidance, not solely LLM-based generative code synthesis.

Markdown Report Issue Upgrade to Chat

References (1)

AutoIAD: Manager-Driven Multi-Agent Collaboration for Automated Industrial Anomaly Detection (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to AutoIAD.

AutoIAD: Automated Visual Anomaly Detection

1. Manager Agent and Pipeline Orchestration

2. Sub-Agent Structure and Function

3. Domain-Specific Knowledge Base

4. Loss Functions, Evaluation Metrics, and Optimization

5. Benchmark Dataset and Evaluation Protocol

6. Comparative Performance and Ablation Analysis

7. Significance, Context, and Implications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

AutoIAD: Automated Visual Anomaly Detection

1. Manager Agent and Pipeline Orchestration

2. Sub-Agent Structure and Function

3. Domain-Specific Knowledge Base

4. Loss Functions, Evaluation Metrics, and Optimization

5. Benchmark Dataset and Evaluation Protocol

6. Comparative Performance and Ablation Analysis

7. Significance, Context, and Implications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research