CoDeGraph: Zero-Shot Consistent Anomaly Filtering
- CoDeGraph algorithm is a graph-based method for consistent anomaly filtering in zero-shot anomaly detection, leveraging patch token analysis and distance scaling.
- It employs statistical insights like similarity scaling and neighbor-burnout to effectively identify and suppress repeated anomalies, achieving high AUROC and segmentation metrics.
- The pipeline integrates local suspicious link detection, graph construction, and community detection to selectively filter repeated defects while preserving normal data.
The term "CoDeGraph algorithm" refers to multiple distinct frameworks across modern research. The principal usages documented in the literature are: (1) a graph-based filtering method for consistent anomalies in zero-shot anomaly detection (notably industrial and medical imaging); (2) a code generation–execution framework for graph reasoning with LLMs (CodeGraph); (3) a static code definition analysis system for call graph generation in software engineering; and (4) an algorithm for linear-delay cograph generation. This article focuses on the most prominent and recent context: CoDeGraph for consistent anomaly filtering in zero-shot anomaly detection, with attention also given to the other definitions.
1. Problem Formulation: Consistent Anomalies in Zero-Shot Anomaly Detection
Zero-shot anomaly classification (AC) and segmentation (AS) aim to identify defective samples or regions in visual data without access to labeled outliers or training images from the target domain. Patch-based anomaly scoring methods leveraging Vision Transformer (ViT) backbones have achieved strong results but rely on the assumption that anomalies are rare and dissimilar across a test batch. The presence of "consistent anomalies"—recurrent, nearly identical defects across multiple samples—systematically biases nearest neighbor–based scoring, leading to high false negative rates on these repeated abnormal regions.
Let denote a batch of test images or volumes, each containing patch tokens extracted by a frozen ViT backbone . The objective is to compute a per-patch anomaly score and per-image score , then decide (anomalous) if . This must be achieved without labeled anomalies, fine-tuning, or supervised adjustment to the test set statistic (Le-Gia et al., 12 Oct 2025, Le-Gia, 2 Dec 2025).
The consistent-anomaly failure mode occurs when the Rarity assumption fails, i.e., when some anomalies are so frequent that they attain many close matches in the test batch, causing their anomaly scores to become indistinguishable from those of true normal regions.
2. Theoretical Insights: Similarity Scaling and Neighbor-Burnout
Two statistical-geometric phenomena underpin the CoDeGraph algorithm's approach to consistent anomaly suppression:
- Similarity scaling: For any reference patch , the vector of minimal distances to each other batch element , where , shows a characteristic scaling behavior. For normal patches, the log-growth rate forms a power-law (), reflecting gradual similarity increases typical of diverse, non-repeating content ((Le-Gia, 2 Dec 2025), Eq.(3.14)).
- Neighbor-burnout: For a patch recurring as an anomaly in samples, there is a sharp discontinuity at the -th nearest neighbor: for , but . The corresponding growth rate exhibits a spike significantly exceeding the scaling law, directly exposing the exhaustion of repeated-neighbor matches (Le-Gia et al., 12 Oct 2025, Le-Gia, 2 Dec 2025).
Extreme Value Theory (EVT) underpins these behaviors: for normal distribution of matches, patch-to-image distances follow a Fréchet law, and scaling statistics such as are Exp-distributed; consistent anomalies break this by exhibiting saturated close matches followed by abrupt increase (Le-Gia et al., 12 Oct 2025).
3. Algorithmic Pipeline: CoDeGraph Construction and Iterative Filtering
CoDeGraph proceeds through three principal computational stages: (1) local detection of suspicious patch–to–collection similarity links; (2) graph construction and community detection among the collections; (3) dependency-based refinement to filter high-confidence consistent anomaly regions without discarding whole images.
Stage 1: Suspicious Link Identification
For each patch token and its sorted distance list (), CoDeGraph defines the endurance ratio: for a fixed reference index (typically, ), with the MSM averaging parameter. Small endurance ratios indicate patches whose early neighbors are deceptive repeats (low distance), but whose neighborhood exhausts sharply.
Suspicious patch–collection links are selected where , with adaptively increased to meet a specified node coverage (e.g., 0.9) over the batch (Le-Gia, 2 Dec 2025).
Optionally, a weighted endurance ratio
mitigates the influence of high-variance categories (Le-Gia et al., 12 Oct 2025).
Stage 2: Anomaly Similarity Graph
Collections (images or volumes) are modeled as nodes. An undirected edge is inserted if any patch in (or vice versa) is linked to in the set of suspicious links, with edge weight
[(Le-Gia, 2 Dec 2025), Eq.(3.16)].
This graph encodes the structure of potentially repeated (consistent) anomalies, which manifest as dense communities.
Stage 3: Community Detection and Structured Refinement
The Leiden algorithm, optimizing the Constant Potts Model (CPM),
is applied, with set to a robust estimate (25th percentile) of the edge-weight distribution. Outlier communities are flagged where the internal edge-density (–$4.5$) [(Le-Gia, 2 Dec 2025), Eq.(3.19)].
Within each outlier, only those patches that show strong dependency on intra-community matches (quantified by the dependency ratio)
and exceed a threshold (e.g., 99th percentile) are excluded from further comparison. This targeted exclusion preserves normal data within affected images while suppressing only the deceptive repeated anomalies.
Pseudocode Outline
- Extract patch tokens for each collection.
- For each patch, compute sorted patch–to–collection distances.
- Compute endurance ratios and select suspicious links (coverage-based).
- Assemble the image (or volume) similarity graph.
- Apply Leiden+CPM, compute densities, identify outlier communities.
- For each outlier, compute patch dependency ratios, exclude those surpassing threshold.
- Recompute anomaly scores over the refined reference base.
Complexity is dominated by all-pairs patch–to–collection search ( naive), with significant savings via chunking and CLS token–based screening (Le-Gia et al., 12 Oct 2025).
4. 3D Medical Imaging, Pseudo-Mask Generation, and Vision–Language Integration
CoDeGraph can be extended to volumetric data (e.g., 3D MRI) via axis-wise patch extraction and pooling:
- For volume and patch size ,
- For each orientation, slice into images and extract 2D patch tokens.
- Pool features to form 3D patch embeddings.
- Concatenate all three axis projections at each voxel location to produce the final patch representation ().
Anomaly scores are then computed as in the 2D case, enabling genuinely zero-shot 3D anomaly segmentation without sample-level training [(Le-Gia, 2 Dec 2025), Eq.(6.1)].
Patch-level anomaly maps can serve as high-quality pseudo-masks for prompt-driven vision–LLMs. CoDeGraph output is thresholded (e.g., by GMM fit to lower 95% of scores, ) to generate binary masks used for supervising vision–language prompt models (AnomalyCLIP, APRIL-GAN), enabling their adaptation without explicit ground-truth segmentation labels (Le-Gia, 2 Dec 2025).
5. Experimental Performance and Robustness
Comprehensive experiments demonstrate the efficacy of CoDeGraph on industrial and medical anomaly detection. For consistent anomaly classes in MVTec AD ("metal_nut", "cable", "pill"), the method achieves up to 98.5% AUROC and strong segmentation F1 (73.8%) and AP (77.2%) using CLIP-ViT-L/14-336, a and AP absolute gain over previous mutual similarity ranking methods (Le-Gia et al., 12 Oct 2025).
Substantial improvements are also observed on custom synthetic datasets (MVTec-SynCA, ConsistAD) and with alternative ViT backbones (DINOv2-L/14). Ablation studies confirm hyperparameter robustness and the necessity of both community detection and targeted patch filtering. The graph-based filtering allows consistent values of the similarity window parameter across all categories, simplifying deployment.
6. Comparison With Other Usages of "CoDeGraph"
The term also appears in distinct algorithmic contexts:
- Graph Reasoning with LLMs (CodeGraph): A prompt-and-interpret execution pipeline where graph problems are solved by LLM-generated Python code, which is then executed to provide correct answers on edge existence, node degree/count, cycle detection, and related tasks (Cai et al., 25 Aug 2024). This approach demonstrably outperforms purely textual reasoning, especially on arithmetic and control-flow-intensive tasks. It relies on systematic, few-shot exemplars and task-specific code generation.
- Code Definition Analysis for Call Graph Generation: Static analysis approach for recovering the inter-procedural call graph in enterprise multi-project C#.NET codebases, using signature hashing and recursive traversal (Veenendaal et al., 2016). Achieves 78.26% accuracy versus manual search, with substantial time savings.
- Enumerative Cograph Generation: A cotree-based algorithm for enumerating all unlabeled cographs with vertices, with delay per output and total time, where is the number of cographs (Jones et al., 2016).
7. Strengths, Limitations, and Future Directions
CoDeGraph for zero-shot anomaly detection provides theoretically motivated and practically robust suppression of systematic, batch-dependent failures in nearest neighbor–based methods. Its strengths include:
- Data-driven, inductive detection of repeated anomalies without manual labeling.
- Preservation of normal samples by fine-grained patch-level filtering.
- Applicability to both 2D and 3D modalities and integration with text–image models.
Limitations include:
- Computational expense of all-pairs distance search, partially mitigated by feature screening.
- Susceptibility to large-scale or highly structured consistent anomaly distributions that may evade density-based community isolation.
- Dependence on quality of underlying features; catastrophic embedding collapse cannot be rectified.
Future research aims at distributed implementation, theoretical analysis of mask/graph stability, and improved invariance to underlying patch representation collapse (Le-Gia, 2 Dec 2025).
References
- "On the Problem of Consistent Anomalies in Zero-Shot Industrial Anomaly Detection" (Le-Gia et al., 12 Oct 2025)
- "On the Problem of Consistent Anomalies in Zero-Shot Anomaly Detection" (Le-Gia, 2 Dec 2025)
- "CodeGraph: Enhancing Graph Reasoning of LLMs with Code" (Cai et al., 25 Aug 2024)
- "Code Definition Analysis for Call Graph Generation" (Veenendaal et al., 2016)
- "Cograph generation with linear delay" (Jones et al., 2016)