Enhanced-PatchCore: Robust Anomaly Detection

Updated 12 May 2026

The paper introduces Enhanced-PatchCore, which improves anomaly detection by implementing leave-one-out scoring and advanced beta-prime thresholding without requiring anomalous validation data.
Enhanced-PatchCore retains the core PatchCore pipeline while enhancing memory bank construction and coreset reduction to efficiently capture representative patch features.
Enhanced-PatchCore demonstrates robust performance in both few-shot and many-shot regimes, yielding substantial gains in industrial visual anomaly detection benchmarks.

Enhanced-PatchCore is an advanced few- and many-shot visual anomaly detection (VAD) method designed to improve upon the original PatchCore paradigm by introducing robust thresholding and memory bank construction strategies that eliminate the need for anomalous validation data, enabling highly stable performance even with very limited training data. Enhanced-PatchCore achieves substantial gains in challenging industrial settings and few-shot regimes through a combination of leave-one-out scoring, robust threshold estimation, and principled evaluation protocols, providing a turnkey anomaly detection framework for critical real-world applications (Arodi et al., 2024, Santos et al., 2023).

1. Foundations: PatchCore and Its Enhancements

PatchCore solves one-class VAD using pre-trained embeddings (e.g., from ResNet backbones), collecting patchwise feature vectors from each "nominal" training image into a large memory bank, which is then subsampled via coreset selection (e.g., geometric farthest-point or greedy k-center) to yield a compact and representative patch feature set. At inference, a test image is assigned an anomaly score as the maximum Euclidean distance between any of its patch features and the nearest memory bank patch. Instances far from this nominal manifold are flagged as anomalous (Roth et al., 2021, Santos et al., 2023).

However, in low-data or real-world scenarios, thresholding requires anomalous validation images—often unavailable or prohibitively costly. PatchCore's original protocol is thus brittle in few-shot settings, with threshold estimates and anomaly scores suffering from high variance when $N_\text{train}\ll100$ (Arodi et al., 2024).

2. Enhanced-PatchCore: Algorithmic Description

Enhanced-PatchCore retains PatchCore's feature extraction and coreset subsampling pipeline but introduces key modifications:

Leave-One-Out Scoring: For each nominal training image $X_i$ , anomaly scores are computed by excluding its own patches from the memory bank:

$\hat{S}(X_i) := \max_{e\in\mathcal{P}(X_i)} \min_{m\in\hat{\mathcal{M}}\setminus\mathcal{P}(X_i)} \|e - m\|_2$

This strategy avoids trivial self-matching, yielding a genuine estimate of the "anomalousness" of nominal data in isolation (Arodi et al., 2024).

Threshold Estimation: The decision threshold $T$ $T$ is derived solely from the empirical distribution $S_\text{train} = \{\hat{S}(X_i)\}$ $Strain={S^(Xi)}$ via one of four approaches:
1. Max: $T_{\text{max}} = \max S_\text{train}$
2. Boxplot Whisker: $T_{\text{whisker}} = Q_3 + 1.5\cdot IQR$
3. Empirical Percentile: $T_{\text{emp}}(\alpha)$ (e.g., $\alpha=95$ )
4. Parametric Beta-Prime Fit: $T_{\text{bp}}(\alpha)$ by fitting $X_i$ 0 Beta′ and inverting CDF

This process allows Enhanced-PatchCore to be tuned for high recall or precision, with the beta-prime method showing particular stability with $X_i$ 1 as small as 10–25 (Arodi et al., 2024).

Inference: Test image $X_i$ 2 is scored as

$X_i$ 3

and labeled anomalous if $X_i$ 4.

3. Few-Shot and Many-Shot Regimes

Enhanced-PatchCore, by eliminating the need for real anomalies during validation and calibration, enables practical operation in both the few-shot ( $X_i$ 5–25) and many-shot ( $X_i$ 6–100) regimes. Key features include:

Robustness: The leave-one-out mechanism and threshold strategies (especially beta-prime and empirical-95) improve stability across evaluation folds, outperforming simple maximum-based thresholding, which shows high variance and outlier sensitivity.
Performance Plateau: Adding nominal examples beyond ~50 can introduce outliers to the memory bank without further gains, suggesting diminishing returns for very large $X_i$ 7 (Arodi et al., 2024).

4. Evaluation Protocols and Metrics

A rigorous k-fold cross-validation scheme is adopted for empirical study. For each cable in the CableInspect-AD dataset, folds are defined by "defect identifiers," each fold using one anomalous frame and 100 subsequent nominal frames for training, with buffer zones; the remainder forms the test set. Metrics include:

Threshold-independent: AUROC, AUPR
Threshold-dependent: Precision, recall, FPR, FNR, F1-score
Segmentation: AUPRO (Area Under the Per-Region Overlap curve), pixelwise PRO score (Arodi et al., 2024)

Model	F1 (↑)	FPR (↓)	AUPR (↑)	AUROC (↑)
LLaVA-1.5-7B (0-shot)	0.59 ± 0.07	0.32 ± 0.19	0.75 ± 0.05	0.68 ± 0.04
CogVLM-17B (0-shot)	0.77 ± 0.02	0.34 ± 0.21	0.83 ± 0.03	0.79 ± 0.04
Enhanced-PatchCore (100-shot, beta-prime-95)	0.75 ± 0.03	0.55 ± 0.19	0.84 ± 0.06	0.78 ± 0.05

Pixel-level AUPRO for Enhanced-PatchCore: 0.53 ± 0.08; for WinCLIP: 0.27 ± 0.06 (Arodi et al., 2024).

5. Optimization Strategies and Practical Considerations

Enhanced-PatchCore inherits and extends hyperparameter tuning and augmentation strategies found effective in few-shot AD:

Backbone Selection: Anti-aliased WideResNet50 (AA-WRN50) provides superior translation equivariance, outperforming conventional high-capacity backbones in few-shot regimes.
Image Augmentation: Incorporates affine, blur, brightness-contrast, sharpen, and flip operations, increasing patch diversity and improving AUROC, especially when carefully composed (e.g., omitting flip for transformation-sensitive textures) (Santos et al., 2023).
Coreset Size Selection: Reducing the memory bank via greedy sampling after augmentation captures nearly all performance gains with an ~80% reduction in storage (Santos et al., 2023).

A plausible implication is that augmentations and coreset reduction are synergistic, improving sample efficiency and scalability for industrial settings.

6. Ablation Insights and Variants

Empirical analysis indicates the following:

Thresholding Method Impact: Maximum-based thresholding yields the highest sensitivity to outliers and variance. Beta-prime-95 and empirical-95 are preferred for robustness, especially when sample sizes are constrained.
Background Removal: Cropping out irrelevant image content reduces variance and slightly boosts AUROC/AUPR for Enhanced-PatchCore, though it may worsen VLM (vision-LLM) zero-shot detection by removing global context.
Few-Shot–Many-Shot Trade-off: Performance gains plateau beyond $X_i$ 850 nominal examples as outliers infiltrate the nominal manifold.
Failure Modes: Categories with substantial in-class rotations challenge current backbones, motivating exploration of equivariant CNNs or transformers (Santos et al., 2023).

7. Future Directions and Research Connections

Enhanced-PatchCore is extensible; potential directions include:

Equivariant Architectures: Rotational or E(2)-equivariant CNNs and vision transformers may address the robustness gap for heavily transformed objects (Santos et al., 2023).
Adaptive Feature Selection: Dynamic optimization of layer selection and learned augmentations could enhance discriminative power across varying tasks.
Language Guidance: Integrating language-based features (on the analogy of WinCLIP) may enable further gains, particularly for few-shot detection (Santos et al., 2023).
Adaptive and Density-aware Thresholding: Per-class or per-image calibration and Mahalanobis distance–based normalization could increase generalization in heterogeneous domains (Roth et al., 2021).

Enhanced-PatchCore thus represents a concrete advance in practical, robust, and sample-efficient visual anomaly detection, directly addressing the limitations of classical PatchCore and supporting new benchmarks such as CableInspect-AD (Arodi et al., 2024, Santos et al., 2023).