Papers
Topics
Authors
Recent
2000 character limit reached

CoDeGraph: Zero-Shot Consistent Anomaly Filtering

Updated 21 December 2025
  • CoDeGraph algorithm is a graph-based method for consistent anomaly filtering in zero-shot anomaly detection, leveraging patch token analysis and distance scaling.
  • It employs statistical insights like similarity scaling and neighbor-burnout to effectively identify and suppress repeated anomalies, achieving high AUROC and segmentation metrics.
  • The pipeline integrates local suspicious link detection, graph construction, and community detection to selectively filter repeated defects while preserving normal data.

The term "CoDeGraph algorithm" refers to multiple distinct frameworks across modern research. The principal usages documented in the literature are: (1) a graph-based filtering method for consistent anomalies in zero-shot anomaly detection (notably industrial and medical imaging); (2) a code generation–execution framework for graph reasoning with LLMs (CodeGraph); (3) a static code definition analysis system for call graph generation in software engineering; and (4) an algorithm for linear-delay cograph generation. This article focuses on the most prominent and recent context: CoDeGraph for consistent anomaly filtering in zero-shot anomaly detection, with attention also given to the other definitions.

1. Problem Formulation: Consistent Anomalies in Zero-Shot Anomaly Detection

Zero-shot anomaly classification (AC) and segmentation (AS) aim to identify defective samples or regions in visual data without access to labeled outliers or training images from the target domain. Patch-based anomaly scoring methods leveraging Vision Transformer (ViT) backbones have achieved strong results but rely on the assumption that anomalies are rare and dissimilar across a test batch. The presence of "consistent anomalies"—recurrent, nearly identical defects across multiple samples—systematically biases nearest neighbor–based scoring, leading to high false negative rates on these repeated abnormal regions.

Let B={C1,,CB}\mathcal{B} = \{C_1, \ldots, C_B\} denote a batch of BB test images or volumes, each CiC_i containing NN patch tokens zi1,,ziNRDz_i^1,\ldots,z_i^N \in \mathbb{R}^D extracted by a frozen ViT backbone ff. The objective is to compute a per-patch anomaly score a(z)a(z) and per-image score A(Ci)=maxha(zih)A(C_i) = \max_h a(z_i^h), then decide G(Ci)=1G(C_i) = 1 (anomalous) if A(Ci)τA(C_i) \geq \tau. This must be achieved without labeled anomalies, fine-tuning, or supervised adjustment to the test set statistic (Le-Gia et al., 12 Oct 2025, Le-Gia, 2 Dec 2025).

The consistent-anomaly failure mode occurs when the Rarity assumption fails, i.e., when some anomalies are so frequent that they attain many close matches in the test batch, causing their anomaly scores to become indistinguishable from those of true normal regions.

2. Theoretical Insights: Similarity Scaling and Neighbor-Burnout

Two statistical-geometric phenomena underpin the CoDeGraph algorithm's approach to consistent anomaly suppression:

  • Similarity scaling: For any reference patch zz, the vector of minimal distances to each other batch element DB(z)=[d(z)(1),,d(z)(B1)]D_\mathcal{B}(z) = [d(z)_{(1)}, \ldots, d(z)_{(B-1)}], where d(z,Cj)=minkzzjkd(z, C_j) = \min_k \|z - z_j^k\|, shows a characteristic scaling behavior. For normal patches, the log-growth rate τi(z)=ln[d(z)(i+1)/d(z)(i)]\tau_i(z) = \ln[d(z)_{(i+1)} / d(z)_{(i)}] forms a power-law (τiiα\langle\tau_i\rangle \sim i^{-\alpha}), reflecting gradual similarity increases typical of diverse, non-repeating content ((Le-Gia, 2 Dec 2025), Eq.(3.14)).
  • Neighbor-burnout: For a patch zaz_a recurring as an anomaly in HBH \ll B samples, there is a sharp discontinuity at the H+1H+1-th nearest neighbor: d(za)(i)<ϵd(z_a)_{(i)} < \epsilon for iHi \leq H, but d(za)(H+1)ϵd(z_a)_{(H+1)} \gg \epsilon. The corresponding growth rate τH(za)\tau_H(z_a) exhibits a spike significantly exceeding the scaling law, directly exposing the exhaustion of repeated-neighbor matches (Le-Gia et al., 12 Oct 2025, Le-Gia, 2 Dec 2025).

Extreme Value Theory (EVT) underpins these behaviors: for normal distribution of matches, patch-to-image distances follow a Fréchet law, and scaling statistics such as τ(i)(x)\tau^{(i)}(x) are Exp(αi)(\alpha i)-distributed; consistent anomalies break this by exhibiting saturated close matches followed by abrupt increase (Le-Gia et al., 12 Oct 2025).

3. Algorithmic Pipeline: CoDeGraph Construction and Iterative Filtering

CoDeGraph proceeds through three principal computational stages: (1) local detection of suspicious patch–to–collection similarity links; (2) graph construction and community detection among the collections; (3) dependency-based refinement to filter high-confidence consistent anomaly regions without discarding whole images.

For each patch token zz and its sorted distance list d(z)(i)d(z)_{(i)} (i=1,,B1i=1, \ldots, B-1), CoDeGraph defines the endurance ratio: ζ(z)(i)=d(z)(i)d(z)(ω)\zeta(z)_{(i)} = \frac{d(z)_{(i)}}{d(z)_{(\omega)}} for a fixed reference index ω>K\omega > K (typically, ω0.3B\omega \approx 0.3 B), with KK the MSM averaging parameter. Small endurance ratios indicate patches whose early neighbors are deceptive repeats (low distance), but whose neighborhood exhausts sharply.

Suspicious patch–collection links are selected where ζ(z,Cj)λ\zeta(z,C_j) \leq \lambda, with λ\lambda adaptively increased to meet a specified node coverage τcov\tau_{\text{cov}} (e.g., 0.9) over the batch (Le-Gia, 2 Dec 2025).

Optionally, a weighted endurance ratio

ζ(z,Cj)=ζ(z,Cj)[d(z,Cj)]α\zeta'(z, C_j) = \zeta(z, C_j) \cdot \left[d(z, C_j)\right]^{-\alpha}

mitigates the influence of high-variance categories (Le-Gia et al., 12 Oct 2025).

Stage 2: Anomaly Similarity Graph

Collections (images or volumes) are modeled as nodes. An undirected edge (i,j)(i,j) is inserted if any patch in CiC_i (or vice versa) is linked to CjC_j in the set of suspicious links, with edge weight

wij={(z,Cj)SzCi}+{(z,Ci)SzCj}w_{ij} = \big| \{(z, C_j) \in S_\ell \,|\, z \in C_i \} \big| + \big| \{(z, C_i) \in S_\ell\,|\, z \in C_j \} \big|

[(Le-Gia, 2 Dec 2025), Eq.(3.16)].

This graph encodes the structure of potentially repeated (consistent) anomalies, which manifest as dense communities.

Stage 3: Community Detection and Structured Refinement

The Leiden algorithm, optimizing the Constant Potts Model (CPM),

Q=i,j(Aijγ)δ(σi,σj)Q = \sum_{i,j} (A_{ij} - \gamma) \delta(\sigma_i, \sigma_j)

is applied, with γ\gamma set to a robust estimate (25th percentile) of the edge-weight distribution. Outlier communities MM are flagged where the internal edge-density ρ(M)>Q3+κIQR\rho(M) > Q_3 + \kappa \cdot IQR (κ=1.5\kappa=1.5–$4.5$) [(Le-Gia, 2 Dec 2025), Eq.(3.19)].

Within each outlier, only those patches that show strong dependency on intra-community matches (quantified by the dependency ratio)

rM(z)=aBM(z)aB(z)r_M(z) = \frac{a_{\mathcal{B} \setminus M}(z)}{a_{\mathcal{B}}(z)}

and exceed a threshold θM\theta_M (e.g., 99th percentile) are excluded from further comparison. This targeted exclusion preserves normal data within affected images while suppressing only the deceptive repeated anomalies.

Pseudocode Outline

  • Extract patch tokens for each collection.
  • For each patch, compute sorted patch–to–collection distances.
  • Compute endurance ratios and select suspicious links (coverage-based).
  • Assemble the image (or volume) similarity graph.
  • Apply Leiden+CPM, compute densities, identify outlier communities.
  • For each outlier, compute patch dependency ratios, exclude those surpassing threshold.
  • Recompute anomaly scores over the refined reference base.

Complexity is dominated by all-pairs patch–to–collection search (O(B2N2)O(B^2 N^2) naive), with significant savings via chunking and CLS token–based screening (Le-Gia et al., 12 Oct 2025).

4. 3D Medical Imaging, Pseudo-Mask Generation, and Vision–Language Integration

CoDeGraph can be extended to volumetric data (e.g., 3D MRI) via axis-wise patch extraction and pooling:

  • For volume VRH×H×HV \in \mathbb{R}^{H \times H \times H} and patch size pp,
    1. For each orientation, slice into images and extract 2D patch tokens.
    2. Pool features to form 3D patch embeddings.
    3. Concatenate all three axis projections at each voxel location to produce the final patch representation (z~(x,y,z)\tilde{z}_{(x,y,z)}).

Anomaly scores are then computed as in the 2D case, enabling genuinely zero-shot 3D anomaly segmentation without sample-level training [(Le-Gia, 2 Dec 2025), Eq.(6.1)].

Patch-level anomaly maps can serve as high-quality pseudo-masks for prompt-driven vision–LLMs. CoDeGraph output is thresholded (e.g., by GMM fit to lower 95% of scores, tj=μj+σjΦ1(0.99)t_j = \mu_j + \sigma_j \Phi^{-1}(0.99)) to generate binary masks used for supervising vision–language prompt models (AnomalyCLIP, APRIL-GAN), enabling their adaptation without explicit ground-truth segmentation labels (Le-Gia, 2 Dec 2025).

5. Experimental Performance and Robustness

Comprehensive experiments demonstrate the efficacy of CoDeGraph on industrial and medical anomaly detection. For consistent anomaly classes in MVTec AD ("metal_nut", "cable", "pill"), the method achieves up to 98.5% AUROC and strong segmentation F1 (73.8%) and AP (77.2%) using CLIP-ViT-L/14-336, a +14.9%+14.9\% and +18.8+18.8 AP absolute gain over previous mutual similarity ranking methods (Le-Gia et al., 12 Oct 2025).

Substantial improvements are also observed on custom synthetic datasets (MVTec-SynCA, ConsistAD) and with alternative ViT backbones (DINOv2-L/14). Ablation studies confirm hyperparameter robustness and the necessity of both community detection and targeted patch filtering. The graph-based filtering allows consistent values of the similarity window parameter KK across all categories, simplifying deployment.

6. Comparison With Other Usages of "CoDeGraph"

The term also appears in distinct algorithmic contexts:

  • Graph Reasoning with LLMs (CodeGraph): A prompt-and-interpret execution pipeline where graph problems are solved by LLM-generated Python code, which is then executed to provide correct answers on edge existence, node degree/count, cycle detection, and related tasks (Cai et al., 25 Aug 2024). This approach demonstrably outperforms purely textual reasoning, especially on arithmetic and control-flow-intensive tasks. It relies on systematic, few-shot exemplars and task-specific code generation.
  • Code Definition Analysis for Call Graph Generation: Static analysis approach for recovering the inter-procedural call graph in enterprise multi-project C#.NET codebases, using signature hashing and recursive traversal (Veenendaal et al., 2016). Achieves 78.26% accuracy versus manual search, with substantial time savings.
  • Enumerative Cograph Generation: A cotree-based algorithm for enumerating all unlabeled cographs with nn vertices, with O(n)O(n) delay per output and O(nMn)O(nM_n) total time, where MnM_n is the number of cographs (Jones et al., 2016).

7. Strengths, Limitations, and Future Directions

CoDeGraph for zero-shot anomaly detection provides theoretically motivated and practically robust suppression of systematic, batch-dependent failures in nearest neighbor–based methods. Its strengths include:

  • Data-driven, inductive detection of repeated anomalies without manual labeling.
  • Preservation of normal samples by fine-grained patch-level filtering.
  • Applicability to both 2D and 3D modalities and integration with text–image models.

Limitations include:

  • Computational expense of all-pairs distance search, partially mitigated by feature screening.
  • Susceptibility to large-scale or highly structured consistent anomaly distributions that may evade density-based community isolation.
  • Dependence on quality of underlying features; catastrophic embedding collapse cannot be rectified.

Future research aims at distributed implementation, theoretical analysis of mask/graph stability, and improved invariance to underlying patch representation collapse (Le-Gia, 2 Dec 2025).


References

Whiteboard

Follow Topic

Get notified by email when new papers are published related to CoDeGraph Algorithm.