PatchCoreCL Continual Learning for Anomaly Detection
- The paper introduces PatchCoreCL as a continual-learning extension of PatchCore, achieving near-optimal anomaly detection in dynamic medical imaging with less than 1% average forgetting.
- PatchCoreCL employs a per-task memory bank strategy along with K-Center coreset subsampling to manage a fixed memory budget and effectively update past task exemplars.
- Experimental results on the BMAD benchmark demonstrate that PatchCoreCL attains Joint-Train level pixel F1 scores and low forgetting rates, solidifying its practical applicability in evolving domains.
PatchCoreCL is a continual-learning adaptation of the PatchCore visual anomaly detection (VAD) framework, specifically designed to address the challenges of evolving input data distributions in highly dynamic domains such as medical imaging. It maintains high detection and localization accuracy across sequentially arriving tasks, all under a strict memory budget, and achieves less than 1% average forgetting compared to static or naïvely fine-tuned baselines (Barusco et al., 25 Aug 2025).
1. Background: PatchCore Model for Anomaly Detection
PatchCore is a two-stage, exemplar-based method for anomaly detection. First, a frozen convolutional network (typically WideResNet50) serves as a feature extractor. An input image is divided into patches, each embedded as . On a dataset of normal images, PatchCore applies K-Center coreset subsampling to create a memory bank of representative embeddings, with number of patches.
Anomaly scoring proceeds as follows:
- Each test patch embedding is assigned an anomaly score by its minimum distance to :
- Image-level score:
- Pixel-level heatmap is constructed by upsampling per-patch scores.
PatchCore does not retrain the feature extractor after coreset construction and operates under the static data assumption.
2. Continual Learning in PatchCoreCL
PatchCoreCL extends PatchCore to the continual-learning (CL) paradigm, where a sequence of anomaly detection tasks—each potentially corresponding to a different imaging modality or domain—arrives over time.
Task Stream and Data Protocol
PatchCoreCL was evaluated on the BMAD benchmark, which comprises six sequential medical anomaly detection tasks:
- Brain_AD (MRI for Alzheimer’s)
- Liver_AD (CT tumors)
- Retina_RESC_AD (retinal OCT)
- Chest_AD (CXR)
- Histopathology_AD (microscopy)
- Retina_OCT2017_AD (additional retinal OCT)
At each stage, only normal images of the given task are available for training; at test time, the system must detect and localize anomalies for all encountered tasks, without access to anomalous samples during training.
3. Fixed-Budget Memory Bank Management
Unlike conventional rehearsal or naive-augmentation strategies, PatchCoreCL uses a per-task memory bank strategy under a total memory constraint 0. For 1 seen tasks, each per-task memory bank 2 receives 3 exemplars.
Bank update procedure:
- For every bank 4 (previous tasks), downsample to 5 entries using K-Center coreset.
- For the current task 6, build 7 from its normal-patch embeddings, also coreset-reduced to 8.
- Formally, with 9 the selection operation:
0
1
No retraining or backbone adaptation is performed; continual adaptation is achieved purely by manipulating the set of representative exemplars.
4. Inference and Task-Identification Mechanism
For a test image 2:
- Extract patch embeddings 3.
- For each bank 4, compute
5
- Select the bank 6. This selects the most relevant domain.
- The minimal anomaly score 7 determines if 8 is normal or anomalous; per-patch distances w.r.t. 9 yield a pixel-level heatmap.
This architecture inherently achieves joint task recognition and anomaly detection without explicit domain classifiers (Barusco et al., 25 Aug 2025).
5. Theoretical Metrics and Continual-Learning Assessment
The principal metrics for CL assessment in PatchCoreCL are:
- Average forgetting (0): quantifies the loss in task performance attributable to learning new tasks:
1
where 2 is the performance on task 3 after learning task 4, 5 after all tasks.
- Relative gap (6) to a "Joint-Train" oracle:
7
Critical procedures and computational logic are expressed in the following pseudocode:
5
6. Experimental Results: BMAD Benchmark
The evaluation protocol included six task splits, with up to 2,000 normal training samples per task, 224×224 input resolution, and both image- and pixel-level validation.
Compared methods:
- Multi-Model: Separate PatchCore for each task (upper-bound accuracy, linear memory cost).
- Joint-Train: Single PatchCore trained on all data so far (upper-bound for accuracy, modest memory).
- Fine-Tuning: Naïve sequential extension (no memory reduction, high catastrophic forgetting).
- PatchCoreCL-10k/30k: PatchCoreCL with 8 and 9 patch embeddings.
Quantitative Findings
| Method | Image AUROC | Pixel F1 | Forgetting (0) | Extra Memory (MB) |
|---|---|---|---|---|
| Multi-Model | n.d. | 0.64 | n.d. | >1000 |
| Joint-Train | ~0.82 | 0.63 | n.d. | moderate |
| Fine-Tuning | n.d. | 0.25 | ~52% | low |
| PatchCoreCL-30k | n.d. | 0.63 | 0.73% | <200 |
| PatchCoreCL-10k | n.d. | 0.62 | 0.80% | <200 |
PatchCoreCL-30k achieves pixel-level F1 equal to Joint-Train, with the relative gap 1 and 2; PatchCoreCL-10k attains comparable performance at one-third the memory cost (Barusco et al., 25 Aug 2025).
7. Qualitative Analysis, Limitations, and Extensions
PatchCoreCL produces anomaly heatmaps that visually match those from static Joint-Train, successfully localizing subtle pathologies such as white-matter lesions and hypodense nodules. Its effectiveness is attributable to:
- Coreset selection, preserving diverse normal prototypes per task;
- A frozen backbone, preventing representational drift;
- Implicit task identification via per-bank minimum anomaly scoring.
Limitations emerge as the number of tasks 3 increases, leading to smaller per-task banks (4) and possible underrepresentation of fine-grained or low-density anomalies from early tasks. The use of a frozen backbone means no adaptation to domain shift, potentially reducing performance for out-of-distribution tasks.
Potential extensions include dynamic memory budget allocation (prioritizing clinically important tasks), lightweight backbone regularization, hybrid replay approaches if data privacy permits, and task-aware normalization to further improve robustness to domain shift (Barusco et al., 25 Aug 2025).
PatchCoreCL demonstrates that continual, fixed-budget replay of patch-level embeddings, without retraining the feature extractor, can achieve near-optimal continual anomaly detection in the medical domain with negligible catastrophic forgetting and significant memory efficiency.