Papers
Topics
Authors
Recent
2000 character limit reached

MultiTypeFCDD: Explainable Anomaly Detection

Updated 21 November 2025
  • MultiTypeFCDD is a fully convolutional, explainable framework designed for multi-type anomaly detection using only image-level labels.
  • It employs a pre-trained Inception-ResNet-v2 encoder and sequential convolutional head to generate class-specific anomaly heatmaps that precisely localize defects.
  • Its resource-efficient, unified design supports real-time deployment across diverse object categories without needing per-class model retraining.

MultiTypeFCDD is a fully convolutional framework for explainable multi-type anomaly detection that produces type-differentiating heatmaps using only image-level labels and a single unified model. It targets the detection, localization, and semantic differentiation of multiple defect types across diverse object categories, with an efficient architecture and intrinsic explainability. MultiTypeFCDD was introduced to address the limitations of conventional explainable anomaly detection methods, which either cannot differentiate anomaly types or require separate models for each object class, and the prohibitive computational requirements of recent vision-LLMs (George et al., 14 Nov 2025).

1. Model Design and Architecture

MultiTypeFCDD utilizes the first three stages of a pre-trained Inception-ResNet-v2 backbone (with all backbone layers frozen) as its encoder ϕ()\phi(\cdot). Given an input XRh×w×3X\in\mathbb{R}^{h\times w\times 3}, the backbone produces a feature tensor of size u×v×du\times v\times d (with u ⁣= ⁣v ⁣= ⁣8u\!=\!v\!=\!8 for a 224×224224\times224 input and d1536d\approx1536). A sequential convolutional head processes this representation:

  • Two blocks: 3×33\times3 convolution layers with 512 filters, batch normalization, and ReLU.
  • A 1×11\times1 convolution to MM output channels, one per anomaly type, yielding a u×v×Mu\times v\times M tensor.
  • A channel-wise pseudo-Huber nonlinearity, defined by

Ak(X)=[ϕk(X;W)]2+11,Ak(X)Ru×v,A_k(X) = \sqrt{[\phi_k(X;W)]^2 + 1} - 1, \qquad A_k(X)\in\mathbb{R}^{u\times v},

for each anomaly type k=1,,Mk=1,\dots,M.

This architecture allows MultiTypeFCDD to function as a unified, multi-type anomaly detector, eliminating the need for per-class model instantiation or retraining (George et al., 14 Nov 2025).

2. Anomaly Scoring, Heatmaps, and Loss Function

Each channel kk is dedicated to a specific anomaly type and produces a low-resolution anomaly heatmap Ak(X)A_k(X). This is bilinearly upsampled to match the input resolution, producing Ak(X)A'_k(X). The global image-level anomaly score for type kk is computed as:

zk(X)=1uvAk(X)1,z_k(X) = \frac{1}{uv}\left\lVert A_k(X) \right\rVert_1,

where the 1\ell_1 norm is over spatial indices.

The model employs a multi-type loss using only image-level anomaly presence/absence labels yik{0,1}y_{ik}\in\{0,1\} for each sample ii and type kk:

L(W)=1MNi=1Nk=1M[(1yik)zk(Xi)yiklog(1ezk(Xi))].\mathcal{L}(W) = \frac{1}{MN} \sum_{i=1}^N \sum_{k=1}^M\left[(1-y_{ik})z_k(X_i) - y_{ik}\log\left(1-e^{-z_k(X_i)}\right)\right].

Here, false positives are penalized for absent types, while present types are encouraged to have high scores (George et al., 14 Nov 2025).

3. Training Procedure and Data Augmentation

Training of MultiTypeFCDD requires only image-level anomaly type labels and a “normal” (all-zero) class. Key aspects include:

  • Randomized augmentations (applied to 50% of images): rotation (±15\pm15^\circ), translation (±20\pm20 pixels), brightness/contrast jitter.
  • Adam optimizer, learning rate 1×1041\times10^{-4}, batch size 32.
  • Balanced sampling per mini-batch: the same number of samples for each class using a coupon-collector resampling per epoch to mitigate class imbalance.

The training process comprises forward computation of type-specific anomaly heatmaps, aggregation into global scores, loss computation, and parameter updates as per the specified loss. The explicit pseudocode formalization ensures reproducibility (George et al., 14 Nov 2025).

4. Explainability and Interpretation

MultiTypeFCDD is intrinsically explainable: each output channel is trained to respond exclusively to a single anomaly type, and the corresponding upsampled channel heatmap Ak(X)A'_k(X) localizes that defect without a separate explanation module. This one-to-one mapping provides direct, class-specific localization.

Validation of the framework’s explainability was performed using pixel-level metrics AUPRO (Area Under the Per-Region Overlap) and P-AUROC, quantifying spatial alignment between predicted heatmaps and ground-truth defect masks. No separate explainer or post-hoc interpretation is introduced (George et al., 14 Nov 2025).

5. Empirical Evaluation and Benchmarking

Dataset

Evaluations utilized Real-IAD, a large-scale anomaly dataset comprising 151,050 images (99,721 normal, 51,329 anomalous) covering 30 objects and 8 defect types (AK: Pit, BX: Deformation, CH: Abrasion, HS: Scratch, PS: Damage, QS: Missing Parts, YW: Foreign Objects, ZW: Contamination), with variable training contamination rates (α{0.1,0.2,0.4}\alpha\in\{0.1,0.2,0.4\}).

Metrics

  • Image-level: I-AUROC, Average Precision (AP), max F1 (F1m).
  • Pixel-level: P-AUROC, AUPRO.

Performance Overview

Setting I-AUROC (%) AUPRO (%) P-AUROC (%) Params (M)
Multi-type (mean over types, α=0.1\alpha=0.1) 94.6 66.3 92.1 ~5.5
Multi-type (mean over types, α=0.4\alpha=0.4) 95.8 68.8 93.4 ~5.5
Multi-class*, α=0.1\alpha=0.1 94.1 75.2 ~5.5
Multi-class*, α=0.4\alpha=0.4 96.4 79.9 ~5.5

Multi-class refers to a single-channel setting for comparison with prior works.

Model Size and Run-Time

Model Size (MB) Inference (CPU ms) Inference (GPU ms)
MultiTypeFCDD 19.7 69.9 1.7
MultiADS (VLM baseline) 1,600 1,824 33.7

Compared baselines include PaDiM, CFlow, PatchCore, SoftPatch, SimpleNet, DeSTSeg, RD, and UniAD (with parameter counts ranging 68–427 M, UniAD: 7.6 M) (George et al., 14 Nov 2025).

6. Practical Implications and Limitations

MultiTypeFCDD enables a single, lightweight (\sim5.5 M parameters) deployable model for multi-type anomaly detection across multiple object categories, removing the need for per-class models. Key practical attributes include:

  • Suitability for real-time deployment on constrained hardware due to minimal computational and memory footprint (model size: 19.7 MB, CPU inference \sim70 ms/image).
  • Direct, explanation-ready output for each anomaly type via per-channel heatmaps.
  • No dependence on expensive annotation: operates with only image-level labels, not pixel-level masks.
  • No explicit open-set anomaly detection mechanism for unseen defect types; a plausible implication is that extending the model for open-set and adaptive anomaly class handling constitutes an open research direction.
  • No online adaptation or automatic discovery of new anomaly types, which is identified as a natural extension (George et al., 14 Nov 2025).

7. Context and Significance Within Anomaly Detection

MultiTypeFCDD provides an alternative to both classical approaches reliant on training separate models for object/defect types, and to recent vision-LLMs, which demand significantly higher computational resources and inference time. Its unified, explainable, and resource-efficient design addresses the critical challenge of real-time, explainable anomaly differentiation in operational environments where both specificity and efficiency are paramount. Its competitive empirical performance and low footprint position it as a viable option for large-scale, practical anomaly detection deployments (George et al., 14 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to MultiTypeFCDD.