MultiTypeFCDD: Explainable Anomaly Detection

Updated 21 November 2025

MultiTypeFCDD is a fully convolutional, explainable framework designed for multi-type anomaly detection using only image-level labels.
It employs a pre-trained Inception-ResNet-v2 encoder and sequential convolutional head to generate class-specific anomaly heatmaps that precisely localize defects.
Its resource-efficient, unified design supports real-time deployment across diverse object categories without needing per-class model retraining.

MultiTypeFCDD is a fully convolutional framework for explainable multi-type anomaly detection that produces type-differentiating heatmaps using only image-level labels and a single unified model. It targets the detection, localization, and semantic differentiation of multiple defect types across diverse object categories, with an efficient architecture and intrinsic explainability. MultiTypeFCDD was introduced to address the limitations of conventional explainable anomaly detection methods, which either cannot differentiate anomaly types or require separate models for each object class, and the prohibitive computational requirements of recent vision-LLMs (George et al., 14 Nov 2025).

1. Model Design and Architecture

MultiTypeFCDD utilizes the first three stages of a pre-trained Inception-ResNet-v2 backbone (with all backbone layers frozen) as its encoder $\phi(\cdot)$ . Given an input $X\in\mathbb{R}^{h\times w\times 3}$ , the backbone produces a feature tensor of size $u\times v\times d$ (with $u\!=\!v\!=\!8$ for a $224\times224$ input and $d\approx1536$ ). A sequential convolutional head processes this representation:

Two blocks: $3\times3$ convolution layers with 512 filters, batch normalization, and ReLU.
A $1\times1$ convolution to $M$ output channels, one per anomaly type, yielding a $u\times v\times M$ tensor.
A channel-wise pseudo-Huber nonlinearity, defined by

$A_k(X) = \sqrt{[\phi_k(X;W)]^2 + 1} - 1, \qquad A_k(X)\in\mathbb{R}^{u\times v},$

for each anomaly type $k=1,\dots,M$ .

This architecture allows MultiTypeFCDD to function as a unified, multi-type anomaly detector, eliminating the need for per-class model instantiation or retraining (George et al., 14 Nov 2025).

2. Anomaly Scoring, Heatmaps, and Loss Function

Each channel $k$ is dedicated to a specific anomaly type and produces a low-resolution anomaly heatmap $A_k(X)$ . This is bilinearly upsampled to match the input resolution, producing $A'_k(X)$ . The global image-level anomaly score for type $k$ is computed as:

$z_k(X) = \frac{1}{uv}\left\lVert A_k(X) \right\rVert_1,$

where the $\ell_1$ norm is over spatial indices.

The model employs a multi-type loss using only image-level anomaly presence/absence labels $y_{ik}\in\{0,1\}$ for each sample $i$ and type $k$ :

$\mathcal{L}(W) = \frac{1}{MN} \sum_{i=1}^N \sum_{k=1}^M\left[(1-y_{ik})z_k(X_i) - y_{ik}\log\left(1-e^{-z_k(X_i)}\right)\right].$

Here, false positives are penalized for absent types, while present types are encouraged to have high scores (George et al., 14 Nov 2025).

3. Training Procedure and Data Augmentation

Training of MultiTypeFCDD requires only image-level anomaly type labels and a “normal” (all-zero) class. Key aspects include:

Randomized augmentations (applied to 50% of images): rotation ( $\pm15^\circ$ ), translation ( $\pm20$ pixels), brightness/contrast jitter.
Adam optimizer, learning rate $1\times10^{-4}$ , batch size 32.
Balanced sampling per mini-batch: the same number of samples for each class using a coupon-collector resampling per epoch to mitigate class imbalance.

The training process comprises forward computation of type-specific anomaly heatmaps, aggregation into global scores, loss computation, and parameter updates as per the specified loss. The explicit pseudocode formalization ensures reproducibility (George et al., 14 Nov 2025).

4. Explainability and Interpretation

MultiTypeFCDD is intrinsically explainable: each output channel is trained to respond exclusively to a single anomaly type, and the corresponding upsampled channel heatmap $A'_k(X)$ localizes that defect without a separate explanation module. This one-to-one mapping provides direct, class-specific localization.

Validation of the framework’s explainability was performed using pixel-level metrics AUPRO (Area Under the Per-Region Overlap) and P-AUROC, quantifying spatial alignment between predicted heatmaps and ground-truth defect masks. No separate explainer or post-hoc interpretation is introduced (George et al., 14 Nov 2025).

5. Empirical Evaluation and Benchmarking

Dataset

Evaluations utilized Real-IAD, a large-scale anomaly dataset comprising 151,050 images (99,721 normal, 51,329 anomalous) covering 30 objects and 8 defect types (AK: Pit, BX: Deformation, CH: Abrasion, HS: Scratch, PS: Damage, QS: Missing Parts, YW: Foreign Objects, ZW: Contamination), with variable training contamination rates ( $\alpha\in\{0.1,0.2,0.4\}$ ).

Metrics

Image-level: I-AUROC, Average Precision (AP), max F1 (F1m).
Pixel-level: P-AUROC, AUPRO.

Performance Overview

Setting	I-AUROC (%)	AUPRO (%)	P-AUROC (%)	Params (M)
Multi-type (mean over types, $\alpha=0.1$ )	94.6	66.3	92.1	~5.5
Multi-type (mean over types, $\alpha=0.4$ )	95.8	68.8	93.4	~5.5
Multi-class*, $\alpha=0.1$	94.1	75.2	–	~5.5
Multi-class*, $\alpha=0.4$	96.4	79.9	–	~5.5

Multi-class refers to a single-channel setting for comparison with prior works.

Model Size and Run-Time

Model	Size (MB)	Inference (CPU ms)	Inference (GPU ms)
MultiTypeFCDD	19.7	69.9	1.7
MultiADS (VLM baseline)	1,600	1,824	33.7

Compared baselines include PaDiM, CFlow, PatchCore, SoftPatch, SimpleNet, DeSTSeg, RD, and UniAD (with parameter counts ranging 68–427 M, UniAD: 7.6 M) (George et al., 14 Nov 2025).

6. Practical Implications and Limitations

MultiTypeFCDD enables a single, lightweight ( $\sim$ 5.5 M parameters) deployable model for multi-type anomaly detection across multiple object categories, removing the need for per-class models. Key practical attributes include:

Suitability for real-time deployment on constrained hardware due to minimal computational and memory footprint (model size: 19.7 MB, CPU inference $\sim$ 70 ms/image).
Direct, explanation-ready output for each anomaly type via per-channel heatmaps.
No dependence on expensive annotation: operates with only image-level labels, not pixel-level masks.
No explicit open-set anomaly detection mechanism for unseen defect types; a plausible implication is that extending the model for open-set and adaptive anomaly class handling constitutes an open research direction.
No online adaptation or automatic discovery of new anomaly types, which is identified as a natural extension (George et al., 14 Nov 2025).

7. Context and Significance Within Anomaly Detection

MultiTypeFCDD provides an alternative to both classical approaches reliant on training separate models for object/defect types, and to recent vision-LLMs, which demand significantly higher computational resources and inference time. Its unified, explainable, and resource-efficient design addresses the critical challenge of real-time, explainable anomaly differentiation in operational environments where both specificity and efficiency are paramount. Its competitive empirical performance and low footprint position it as a viable option for large-scale, practical anomaly detection deployments (George et al., 14 Nov 2025).

PDF Markdown Chat (Pro)

References (1)

Explainable Deep Convolutional Multi-Type Anomaly Detection (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to MultiTypeFCDD.