Multi-Criteria Defect Detection

Updated 6 December 2025

Multi-Criteria Defect Detection is a method that integrates structural, logical, and appearance cues to identify and localize diverse defects in engineered systems.
It employs integrated pipelines combining RGB-D sensing, semantic scene decomposition, and multimodal fusion to achieve robust defect detection, including zero-shot scenarios.
Benchmarking shows improved metrics such as IoU and F1 scores, with multimodal fusion reducing false positives by up to 30% compared to single-criterion approaches.

Multi-criteria defect detection refers to the principled identification, localization, and quantification of diverse defect types within manufactured products or engineered systems, exploiting multiple, orthogonal criteria (structural, logical, appearance-based, and more) in unified frameworks. These approaches systematically extend beyond uni-criterion detection (e.g., solely geometric or texture-based) by integrating heterogeneous cues, enabling more robust, generalizable, and interpretable defect inspection under both supervised and zero-shot paradigms across complex real-world environments.

1. Formal Taxonomies and Annotation Strategies

Central to multi-criteria defect detection is the rigorous formalization of defect taxonomies with hierarchical annotation standards. A representative system organizes defects as follows (Araya-Martinez et al., 28 Nov 2025):

Pose annotation: 6D pose (translation $t\in\mathbb R^3$ , rotation $R\in \mathrm{SO}(3)$ ) per BOP standard.
Structural defects: pixel-wise masks (e.g., cracks, deformations, warping, dents).
Logical defects: polygons annotated for existence (missing part), position (misalignment), type (incorrect subtype), surface properties (color/material mismatches).

The top-level taxonomy dichotomizes defects into:

Super-category	Sub-categories
Structural	Deformation, Cracks, Dents, Warping, Impact marks, ...
Logical	Existence, Position, Type (qty/size/match), Color/Material

This extensible annotation scheme, compatible with COCO/BOP conventions, allows unified benchmarking and composite evaluation across heterogeneous defect morphologies and causal origins.

2. Integrated Methodological Pipelines

Multi-criteria detection systems synthesize vision, geometry, and logic domains, employing the following core components (Araya-Martinez et al., 28 Nov 2025, Dey et al., 2022, Rachuri et al., 23 Dec 2024):

Sensing and Preprocessing: Inputs may include RGB-D images, depth maps, time-series NDE signals (IE, USW), or multi-modal sensor arrays.
Semantic Scene Decomposition: Object detection and CAD-based pose estimation (e.g., YOLO v8 + FoundationPose → ICP), enabling semantic digital twinning for reference generation (Araya-Martinez et al., 28 Nov 2025).
Defect Criteria Extraction:
- Depth/Geometric deviations: $A_\text{depth}[i,j]=\lvert D_\text{render}[i,j] - D_\text{real}[i,j]\rvert_1$ .
- Color/Appearance deviations: In CIELAB, $A_\text{color}[i,j]=\lVert C_\text{render}[i,j] - C_\text{real}[i,j]\rVert_2$ .
- Structural anomalies: Detected via mask-prediction or morphological post-processing.
- Logical anomalies: Detected by topological mismatch, existence/absence, or misassembly.
Scoring and Thresholding: Formation of defect masks (thresholding $A_\text{depth}$ / $A_\text{color}$ ) and computation of standard metrics (intersection-over-union, mean IoU).
Multimodal Fusion: In complex SHM domains, alpha-shape geospatial fusion of multivariate anomaly point clouds integrates NDE modalities with contour-aligned image features (Rachuri et al., 23 Dec 2024).

This unified logic-structure-appearance pipeline enables the detection of both known and previously unseen defect types, including highly variable logical and geometric faults.

3. Zero-Shot and Low-Data Generalization

A major advance in multi-criteria defect detection is robust zero-shot generalization (Araya-Martinez et al., 28 Nov 2025, Sadikaj et al., 9 Apr 2025). These frameworks:

Require no defect-specific training; only object detection and (optionally) pose modules are learned.
Construct on-the-fly “zero-defect” references using scene graph + digital twins.
Detect arbitrary defect modalities as scene- or object-level deviations from idealized CAD-based expectations, including new cracks, surface anomalies, or logical inconsistencies (e.g., missing inserts), minimizing retraining costs.
Empirically achieve up to 63.3% IoU against ground-truth masks under semi-controlled industrial conditions using simple per-pixel distance metrics (Araya-Martinez et al., 28 Nov 2025).

Zero-shot prompt-based architectures (e.g., MultiADS (Sadikaj et al., 9 Apr 2025)) use cross-modal alignment of rich defect-centric text prompts to CLIP-based patch features, further extending multi-type anomaly segmentation and multi-label detection without explicit training on defect exemplars.

4. Benchmarking, Metrics, and Quantitative Results

Evaluation protocols for multi-criteria detection emphasize both instance-level and per-criterion breakouts:

Structural and logical defect detection: Mean IoU up to 63.3% for existence anomalies and 62.9% for color anomalies in RGB-D digital twin comparisons under semi-controlled conditions (Araya-Martinez et al., 28 Nov 2025).
Multimodal SHM: F1 rises from 0.71–0.75 (single-modality) to 0.83 with multimodal fusion and contour-based cross-verification, reducing false positives by 30% (Rachuri et al., 23 Dec 2024).
Instance-based deep architectures: Mask R-CNN yields mAP ≈0.936 ([email protected]) across fine defect categories; YOLOv5-based weld inspection reaches [email protected] = 98.7% across eight defect types (Dey et al., 2022, Yang et al., 2021).
Cross-domain and zero-shot: MultiADS attains pixel-level AUROC ≥95% and competitive macro-F1 across five industrial datasets, outperforming previous zero-/few-shot baselines in multi-type segmentation (Sadikaj et al., 9 Apr 2025).

Evaluation protocols typically include IoU, precision/recall, mAP, pixel-level AUROC/AUPRO, and scene-wide anomaly F1, spanning all represented defect criteria.

5. Representative Architectures and Methodological Innovations

Recent frameworks integrate hierarchical annotation and multi-criterion detection with:

Differentiable, per-defect mask and feature alignment (Mask R-CNN, SCM-MRCNN with channel/spatial attention (Yu et al., 6 Feb 2024)).
Digital twin-driven scene simulation: On-the-fly CAD rendering for reference generation against which real-world deviations are scored (Araya-Martinez et al., 28 Nov 2025).
Multi-modal fusion and cross-verification: Alpha-shape geospatial fusion and contour-based validation (NDE + vision) to reduce ambiguity (Rachuri et al., 23 Dec 2024).
Zero-shot, multi-type segmentation: Patch-to-prompt cosine similarity over CLIP embeddings, with per-type mask extraction and prompt-based extensibility (Sadikaj et al., 9 Apr 2025).
Hybrid statistical-ML fusions: Exploiting Fisher-separation and statistical feature selection atop deep or classical detectors for noise-robustness (Menéndez, 11 Dec 2024).
RL-based multi-criteria exploration: Tunable multi-objective reinforcement learning reward design for Trojan and rare fault discovery in complex circuits (Sarihi et al., 2023).

These diverse architectures address both factory-side high-throughput inspection and field-side asset health monitoring/maintenance, efficiently bridging varied modalities and defect taxonomies.

6. Limitations, Open Challenges, and Future Directions

Despite substantial gains, current multi-criteria defect detection systems face several technical challenges (Araya-Martinez et al., 28 Nov 2025):

Sensing limitations: Depth sensor noise, adverse lighting, non-ideal surfaces degrade geometric comparison fidelity.
Alignment robustness: ICP-based pose refinement requires good initial estimates; clutter and occlusion compromise matching.
Appearance metric limitations: Basic Euclidean color metrics in LAB may yield false positives; more sophisticated, perceptually calibrated metrics are needed.
Annotation extensibility: Hierarchical taxonomies must expand to capture new logical/functional criteria; e.g., complex assembly constraints or dynamic connectivity.
Dynamic and temporal domains: Moving/deformable parts require temporal fusion and real-time scene tracking.
Active view planning: Integration with robot-guided sensor positioning can address occlusion-induced coverage gaps.

Future work is oriented toward deep learned comparison functions (e.g., semantic consistency), temporal and active inspection strategies, domain transfer by CAD-model loading, and expansion to further structured, logical, or physical defect classes across manufacturing and asset health applications (Araya-Martinez et al., 28 Nov 2025).

By formalizing and unifying criteria across geometric, logical, appearance-based, and semantic domains, and embedding these into scalable, high-throughput pipelines, multi-criteria defect detection approaches establish a common substrate for robust, interpretable, and extensible visual quality inspection in complex industrial and infrastructure settings (Araya-Martinez et al., 28 Nov 2025, Dey et al., 2022, Sadikaj et al., 9 Apr 2025, Rachuri et al., 23 Dec 2024).