Papers
Topics
Authors
Recent
2000 character limit reached

CCAD: Compressed Global Feature Anomaly Detection

Updated 1 January 2026
  • The paper introduces a novel two-stream architecture that combines global feature compression with latent diffusion reconstruction, achieving superior image and pixel-level AUCs.
  • The methodology integrates a fixed pretrained encoder and a U-Net-style diffusion model enhanced by cross-attention, enabling rapid convergence even under domain shifts.
  • Extensive experiments, including on the re-annotated DAGM 2007 dataset, demonstrate that CCAD delivers precise anomaly localization and robust performance across industrial benchmarks.

Compressed Global Feature Conditioned Anomaly Detection (CCAD) is an advanced paradigm unifying unsupervised representation-based and reconstruction-based approaches for industrial visual anomaly detection. CCAD exploits global dataset-level feature banks as explicit conditioning for a diffusion-based reconstruction model, facilitating robust feature extraction and efficient training, particularly under domain shift and limited anomalous supervision. The architecture incorporates a two-stage feature compression mechanism and integrates cross-attention-driven global conditioning into a U-Net-style latent diffusion network. Extensive experiments, including with a re-annotated DAGM 2007 dataset, establish superior convergence speed and state-of-the-art image/pixel-level AUCs across diverse benchmarks (Jin et al., 25 Dec 2025).

1. High-Level Framework

CCAD comprises two tightly coupled streams:

  • Global Feature Compression Stream: Extracts feature representations from all normal training images using a fixed pretrained encoder (e.g., ResNet), forming a large pool of dataset-level features. This pool undergoes compression, producing a compact, representative global feature bank.
  • Diffusion-Based Reconstruction Stream: Utilizes a latent diffusion model (U-Net backbone) to denoise a noisy latent vector back to a normal image distribution. The reconstruction is conditioned on local features from the input image and on the compressed global-feature bank.

During inference, anomalous images are reconstructed towards the learned normal distribution. Anomaly scores are derived by comparing features (e.g., cosine similarity) between the original and reconstructed images on both pixel and image levels.

2. Adaptive Global Feature Compression

Given normal training images X={xi}i=1N\mathcal{X} = \{x_i\}_{i=1}^N of size H×WH\times W, a fixed encoder F\mathcal{F} extracts dd-dimensional features from patches:

vn=F(xn),D={vnRd}n=1M,M=NH/mW/m,v_n = \mathcal{F}(x_n),\quad \mathcal{D} = \{v_n \in \mathbb{R}^d\}_{n=1}^M, \quad M = N\cdot \lfloor H/m \rfloor \cdot \lfloor W/m \rfloor,

where mm is the patch downsampling factor. Since D\mathcal{D} can be excessively large, CCAD introduces a two-stage compression:

  • Coarse Feature Bank (CFB): A coreset-sampling operator S\mathcal{S} selects ξ\xi representative features (ξ1000\xi \leq 1000) from D\mathcal{D}:

Bc={vk}k=1ξ=S(D)\mathcal{B}_c = \{v_k\}_{k=1}^\xi = \mathcal{S}(\mathcal{D})

  • Fine Feature Bank (FFB): At each training step, a batch-level subset DbsD\mathcal{D}_{bs} \subset \mathcal{D} of size ζ\zeta is sampled. The trainable Fine Compression Module (FCM) τθ\tau_\theta cross-attends between Dbs\mathcal{D}_{bs} (queries) and Bc\mathcal{B}_c (keys/values), yielding a refined bank Bf\mathcal{B}_f via

Q=DbsWQ,K=BcWK,V=BcWV,Bf=softmax ⁣(QKdk)VWB,Q = D_{bs}W_Q, \quad K = B_cW_K, \quad V = B_cW_V, \quad B_f = \mathrm{softmax}\!\left(\frac{QK^\top}{\sqrt{d_k}}\right)VW_B,

where {WQ,WK,WV,WB}\{W_Q, W_K, W_V, W_B\} are learned matrices. This multi-level compression efficiently condenses dataset-level priors for scalable and adaptive conditioning.

3. Diffusion Model Architecture and Conditioning

CCAD leverages a latent diffusion framework. An image x0RH×W×3x_0 \in \mathbb{R}^{H\times W\times 3} is mapped through an encoder E\mathcal{E} to latent z0=E(x0)\mathbf{z}_0 = \mathcal{E}(x_0). At diffusion timestep tt:

zt=αˉtz0+1αˉtϵ,ϵN(0,I)\mathbf{z}_t = \sqrt{\bar\alpha_t}\, \mathbf{z}_0 + \sqrt{1-\bar\alpha_t}\, \epsilon, \quad \epsilon \sim \mathcal{N}(0, I)

The denoising network ϵΘ(zt;c)\epsilon_\Theta(\mathbf{z}_t; c) predicts added noise, where cc encodes conditional information.

Global-feature Conditioned Blocks (GCB) are inserted into the U-Net at multiple resolutions. Each GCB augments the ResBlock and self-attention with a cross-attention operation over either Bc\mathcal{B}_c or Bf\mathcal{B}_f, depending on the variant. This design enables each U-Net stage to access holistic dataset statistics.

The three official variants are:

Variant Diffusion Space Global Conditioning
CCAD(V) Pixel-space Coarse bank Bc\mathcal{B}_c, no fine
CCAD(C) Latent-space Coarse bank Bc\mathcal{B}_c
CCAD(F) Latent-space Trainable fine bank Bf\mathcal{B}_f

4. Training Objectives and Loss Functions

CCAD extends the standard denoising diffusion probabilistic model (DDPM) loss:

LDM=Et,z0,ϵϵϵΘt(zt;cf)22\mathcal{L}_{\mathrm{DM}} = \mathbb{E}_{t, \mathbf{z}_0, \epsilon} \| \epsilon - \epsilon_\Theta^t(\mathbf{z}_t; c_f) \|_2^2

For CCAD(F), the condition cfc_f includes local features and Bf\mathcal{B}_f:

LCCAD(F)=Et,x,ϵϵϵΘt(zt;x;τθ(Dbs,Bc))22\mathcal{L}_{\mathrm{CCAD(F)}} = \mathbb{E}_{t, x, \epsilon} \Bigl\| \epsilon - \epsilon_\Theta^t(z_t; x; \tau_\theta(\mathcal{D}_{bs}, \mathcal{B}_c)) \Bigr\|_2^2

For CCAD(C):

LCCAD(C)=Et,x,ϵϵϵΘt(zt;x;Bc)22\mathcal{L}_{\mathrm{CCAD(C)}} = \mathbb{E}_{t, x, \epsilon} \| \epsilon - \epsilon_\Theta^t(z_t; x; \mathcal{B}_c) \|_2^2

No additional regularizers are applied beyond the 2\ell_2 noise prediction term. Conditions can be augmented with ControlNet local features or global banks as required by the variant.

5. Reorganized and Re-Annotated DAGM 2007 Dataset

The original DAGM 2007 dataset supplied 15,000 normal and 2,100 defective synthetic texture images, labeled by coarse ellipses. CCAD authors re-annotated four challenging classes—defect, scratch, blur, and spots—with precise pixel-accurate masks. For each, 300 normal images were used for training, all defectives and 75 normals for testing.

Experiments confirm that the re-annotated masks yield almost identical image-level AUC, but improved pixel-level AUC for all methods, indicating that the finer masks correspond more exactly with true anomaly contours. Consequently, the refined DAGM supports more reliable benchmarking of pixel-wise anomaly localization (Jin et al., 25 Dec 2025).

6. Experimental Benchmarks and Quantitative Results

CCAD was evaluated on MVTec-AD, VisA, MVTec-3D, MVTec-Loco, MTD, and the new DAGM splits. Metrics include AUROC (image/pixel), maximal F1, and average precision.

Key quantitative results:

Method Image-AUC (MVTec-AD) Pixel-AUC (MVTec-AD) Epochs to Reference AUC
CCAD(V) 0.968 0.965 500–3000
DDAD 0.962 0.966
PatchCore 0.858 0.948
CCAD(C) 0.953–0.961 0.959–0.962 100
CCAD(F) 0.953–0.961 0.959–0.962 110
DiAD 0.950 0.954 200

Across VisA, MVTec-3D, and MVTec-Loco, CCAD consistently matches or outperforms prior SOTA. Ablation studies show that reducing coarse bank size to ξ=10\xi=10 retains most AUC, indicating robust selection of informative global features by the network's cross-attention mechanism.

A plausible implication is that the inclusion of dataset-level statistics via compressed global banks not only provides stronger priors but also accelerates convergence, as seen in the reduction of epochs required to reach benchmark AUC values compared to existing methods.

7. Qualitative Analysis of Anomaly Localization

CCAD provides reconstructions that distinctly remove anomalies in the original input, resulting in heatmaps reflecting the cosine similarity between reconstructions and ground truth masks. Qualitative samples from MVTec-AD, VisA, MVTec-3D, and MVTec-Loco demonstrate that CCAD yields sharper anomaly localization than prior diffusion models such as DDAD and DiAD. In cases utilizing the re-annotated DAGM, generated anomaly maps align more accurately with the fine-grained ground-truth masks.

The ability to maintain precise and consistent localization further supports CCAD's advantage in industrial visual inspection and similar domains where both detection accuracy and fine localization are critical (Jin et al., 25 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Compressed Global Feature Conditioned Anomaly Detection (CCAD).