Continual Anomaly Detection (CAD)

Updated 14 November 2025

Continual Anomaly Detection (CAD) is a methodology for training models on sequential tasks to identify anomalies while preventing catastrophic forgetting.
Key strategies include regularization-based parameter anchoring and memory-based replay, achieving high performance (e.g., AUROC > 0.97) across diverse domains.
Applications span manufacturing, medical imaging, and video surveillance, where CAD methods enable robust anomaly detection under evolving data distributions.

Continual Anomaly Detection (CAD) is the discipline concerned with training models to detect anomalies across a dynamically evolving sequence of tasks, domains, or data distributions, with an explicit constraint: prior data (especially “normal” or majority-class samples) cannot be revisited after each task updates the model. The objective is to maintain high detection performance across all seen tasks while mitigating catastrophic forgetting, that is, the tendency for performance on older tasks to degrade as new knowledge is acquired. This field encompasses a spectrum of supervised, semi-supervised, and unsupervised settings, and arises in practical scenarios such as fault detection in manufacturing, medical imaging diagnostics, video surveillance, fraud detection in audit logs, and 3D industrial inspection.

1. Formal Definition and Typical Problem Scenarios

A typical CAD scenario presents a sequential stream of $K$ tasks $T_1,\dots,T_K$ , each defined by a training set (often with only normal samples) and a test set with both normal and anomalous data. The model $f(\cdot\,;\theta)$ is expected, after observing task $T_k$ , to accurately detect anomalies for the cumulative test data from all previously seen tasks. Crucially, at each stage $k$ , training access is restricted to $T_k$ (or a compressed/augmented summary of earlier tasks), under strict constraints on computational and memory resources.

The formal loss in the supervised discrete manufacturing case is the empirical cross-entropy,

$L_{\text{task}_k}(\theta) = \mathbb E_{(x,y)\sim D_k}\Big[ -y\log p_\theta(y=1\,|\,x) - (1-y)\log p_\theta(y=0\,|\,x)\Big],$

while in unsupervised or semi-supervised Table 3AD and continual 3D settings, the reconstruction loss for autoencoder- or diffusion-based methods may take the form

$\mathcal{L}_{\text{rec}}(x; \theta) = \|x - \hat x\|^2_2,$

with $\hat x$ as the decoded or reconstructed input. Anomaly scoring typically derives from reconstruction errors, embedding distances, or likelihood ratios.

2. Principal Methodological Strategies

CAD research has developed a range of approaches, driven by the core challenges of catastrophic forgetting, limited revisitability of past data, and distributional (or task) drift:

2.1 Regularization-based Parameter Anchoring

Regularization-based methods augment the native task loss with penalties discouraging deviation from parameter values $\theta^*_i$ optimal for earlier tasks. For instance, Elastic Weight Consolidation (EWC) enforces

$L_{\text{total}}^{\mathrm{EWC}}(\theta) = L_{\text{task}_t}(\theta) + \lambda \sum_{i<t}\tfrac12 \sum_{j} F_{i,j}(\theta_j - \theta^*_{i,j})^2,$

where $F_{i,j}$ is the diagonal of the Fisher information from task $i$ (Maschler et al., 2021). The Online EWC variant accumulates all prior Fishers into a single matrix, reducing storage overhead. Methods such as Synaptic Intelligence (SI) and Learning without Forgetting (LwF) offer alternative “importance” metrics or distillation-based regularizers.

2.2 Memory- and Replay-based Schemes

To address the fundamental limitation of not revisiting full past data, memory-based approaches store compressed summaries (coresets) of past data or features for continual rehearsal. CADIC, for instance, maintains a single, unified coreset of normal-sample embeddings and incrementally augments it to optimize feature diversity and coverage, sidestepping the need for multiple task-specific memories (Yang et al., 10 Nov 2025). PatchCoreCL preserves a bounded set of per-task embedding banks and, upon arrival of new tasks, contracts all memories via coreset selection to respect a global budget (Barusco et al., 25 Aug 2025).

Generative replay injects historical distribution support via synthetic examples. Generative VAEs or diffusion models are harnessed as replay engines in ReplayCAD (diffusion with semantic/spatial compression) (Hu et al., 10 May 2025) or semi-supervised VAE frameworks (Belham et al., 1 Dec 2024). Outlier rejection, via techniques such as Extreme Value Theory (EVT) thresholding on latent distances, can be applied to curtail distributional drift in replayed or synthetic samples.

2.3 Prompting, Adaptation, and Task-Identification

Contrastively-learned prompt modules (e.g., UCAD) enable adaptation of a frozen feature extractor by learning lightweight, task-specific prompts and storing per-task keys and representative normal embeddings. At inference, the best-matching task context is selected by comparing the input to stored keys, thus ensuring task-dependency without explicit labels (Liu et al., 2 Jan 2024).

2.4 Meta-learning

Meta-learning formulations, as exemplified by ARCADe (Frikha et al., 2020), cast CAD as a bi-level optimization where learning rates and initializations are meta-optimized for rapid adaptation to new tasks while retaining robust performance on previously encountered tasks.

2.5 Applications to Modalities beyond 2D Imaging

C3D-AD introduces kernel attention and dynamic representation rehearsal for class-incremental 3D point cloud anomaly detection, integrating local tokenization, an adaptive advisor module, and representation rehearsal losses to strengthen robustness under sequential task additions (Lu et al., 2 Aug 2025).

3. Representative Architectures and Model Choices

Time-series: Two-layer LSTM models with softmax output for industrial pressure-sensor data (input dimension 3000) (Maschler et al., 2021).
Images: Frozen Vision Transformers (e.g., ViT-Base-Patch8-224 at layer 9) and ResNets as feature extractors, with coreset or memory-based anomaly scoring (Yang et al., 10 Nov 2025, Liu et al., 2 Jan 2024).
Diffusion models: Latent diffusion with semantic- and spatial-conditional embeddings for high-fidelity generative replay, and anomaly-masked conditioning to control faithfulness hallucination (Hu et al., 10 May 2025, Li et al., 27 Feb 2025).
Tabular/structured: Deep autoencoders for categorical+numeric audit data (Hemati et al., 2021).
3D point clouds: Token-based kernel attention, learnable advisors, and parameter perturbation rehearsal modules (Lu et al., 2 Aug 2025).

4. Evaluation Protocols, Metrics, and Datasets

Common datasets include:

MVTec AD (15 industrial object categories), VisA (12 categories), BMAD (six medical imaging modalities), Real3D-AD, Anomaly-ShapeNet, and MulSen-AD for point clouds.
Evaluation commonly uses AUROC and AUPR at image- and pixel-levels, retained accuracy (mean task accuracy post-final update), and task-forgetting measures (e.g., FM or average drop in F1/AUROC across tasks).

Empirical findings consistently reveal that simple sequential fine-tuning results in severe forgetting (e.g., worst-task accuracy drops to 0.52 from a best of 0.93 in supervised manufacturing heterogeneity (Maschler et al., 2021); forgetting of 52.3% in pixel-F1 across medical imaging tasks (Barusco et al., 25 Aug 2025)). Regularization-based (EWC, SI, LwF) and coreset-based (PatchCoreCL, CADIC) methods reduce this degradation, with memory-efficient schemes (e.g., unified incremental coreset as in CADIC) maintaining high AUROC (>0.97) and <2% forgetting even as task counts grow (Yang et al., 10 Nov 2025, Barusco et al., 25 Aug 2025).

Generative and prompt-based adaptations (ReplayCAD, UCAD) further diminish forgetting for segmentation and localization tasks, with ReplayCAD demonstrating >11.5 percentage point pixel-AP gains over previous SOTA (Hu et al., 10 May 2025) and prompt-based UCAD yielding lower FM than all rehearsal-based comparators (Liu et al., 2 Jan 2024).

5. Trade-offs, Limitations, and Open Research Questions

All contemporary CAD solutions engage explicit trade-offs:

Regularization strength (λ) must balance plasticity (learning new tasks) and stability (preservation of old). Excessive regularization impedes acquisition of novel concepts, whereas insufficient constraints allow forgetting (Maschler et al., 2021).
Memory-based rehearsal is limited by storage budgets and may require careful coreset construction or compression, e.g., via super-resolution to maintain sample quality per byte (Pezze et al., 2022).
Generative replay hinges on the fidelity and diversity of synthetically generated data; pure semantic conditioning may fail to match the manifold's spatial variance, mandating spatial augmentation (Hu et al., 10 May 2025).
Per-task key/prompts (UCAD) introduce storage linear in task count, while unified coresets (CADIC) allocate capacity according to feature-space diversity but can underrepresent earlier, less diverse tasks in long sequences (Yang et al., 10 Nov 2025).
Diffusion-based CAD solutions incur computational and latency overhead, and their reliance on high-quality pretrained backbones or segmentation masks (e.g., from SAM) introduces external dependencies and possible failure modes on out-of-distribution surfaces (Hu et al., 10 May 2025, Li et al., 27 Feb 2025).
Most methods assume a clear task boundary or batch structure. Handling online non-stationarity and truly open-world task boundaries remains a substantial open issue.

Open research directions include: dynamically adaptive regularization and coreset sizing via meta-learning or Bayesian optimization (Maschler et al., 2021), hybrid rehearsal/regularization schemes (Barusco et al., 25 Aug 2025), representation augmentation for extremely low-shot tasks, and extensions to open-set or multi-modal tasks, particularly in domains with scarce labeled anomalies or complex spatial structure. In addition, ongoing work seeks to reduce replay and coreset memory consumption without sacrificing task coverage, distill memory-based methods into lighter-weight student models, and formalize causal and explanation-aware anomaly detection for high-stakes deployment scenarios.

6. Domain-specific CAD Instantiations and Impact

CAD has found deployment and benchmarking in several applied fields:

Discrete manufacturing: Regularization-based and memory-replay LSTM models enhance robustness to evolving product lines and machinery setups (Maschler et al., 2021, Yang et al., 10 Nov 2025).
Medical imaging: PatchCoreCL demonstrates that nonparametric rehearsal can offer nearly joint-training-level anomaly localization accuracy (pixel-F1 relative gap ~1%) over months of cumulative domain shift (Barusco et al., 25 Aug 2025).
Video surveillance: Frame-prediction plus expectation-maximization filtering (single-pass, no replay) achieves superior AUC and streaming adaptation in crowd anomaly detection (Khan et al., 2020).
Finance/audit: Autoencoder + continual-learning variants curb false positives and false negatives in dynamic accounting environments, with experience replay most effective for distributional drift (Hemati et al., 2021).
3D inspection: C3D-AD unifies continual anomaly detection for point-cloud modalities, outperforming prior continual and one-shot methods by 3–30% AUROC depending on scenario (Lu et al., 2 Aug 2025).

A plausible implication is that, as CAD methods are adapted and validated for previously under-explored settings such as medical diagnosis and 3D inspection, demand for both low-forgetting, resource-efficient adaptation and rapid, explainable inference will shape future CAD benchmarks and methodological developments.