Sample-Centric Multi-Task Learning Framework
- The paper introduces a dual-branch framework that combines segmentation and global classification to enhance defect detection in industrial inspection.
- It employs a shared encoder to integrate pixel-level and sample-level supervision, thereby improving the recall of subtle or sparse defects.
- The approach defines decision-linked metrics, such as Sample_mIoU, to closely align evaluation with the operational pass/fail requirements in quality control.
A sample-centric multi-task learning framework is an approach in which training and evaluation are structured around the sample as the primary unit, rather than aggregating supervision and optimization solely at the pixel or instance level. This methodology is particularly relevant for industrial surface defect inspection, where each sample (e.g., a manufactured part) typically requires a holistic quality control (QC) decision—whether it passes or fails—and, if defective, precise localization of defects. The framework unifies sample-level defect classification and pixel-level segmentation through a shared-encoder architecture, leveraging sample-level supervision to guide feature learning and decision-making. The approach also introduces dedicated evaluation metrics that tightly link localization quality to the correctness of per-sample decisions, thus aligning model optimization and assessment with real-world operational requirements (Dong et al., 15 Oct 2025).
1. Motivation and Problem Setting
In industrial inspection, the target outcomes are often sample-level pass/fail decisions and reliable spatial localization of defects. Conventional pixel-centric semantic segmentation frameworks optimize for aggregate metrics such as mean Intersection over Union (mIoU), which are dominated by background pixels in imbalanced datasets. This pixel-level focus inadequately addresses the core requirement of high-recall detection of small, sparse, or low-contrast defects on a per-sample basis. Empirical observations demonstrate that while such models can achieve strong mIoU, they may fail to sustain acceptable recall and decision stability for rare or subtle defects—resulting in unreliable QC on the production line. The mismatch between optimization objectives (pixel overlap) and application-level goals (binary sample pass/fail with supporting localization) is thus the principal catalyst for a sample-centric framework.
2. Framework Architecture: Shared Encoder with Dual Branches
The proposed framework employs a shared encoder that feeds two branches:
- Segmentation Branch: Outputs a per-pixel binary mask indicating the spatial extent of defects. Supervised with pixel-level binary cross-entropy, it preserves shape and boundary information fundamental for precise defect localization.
- Sample Classification Branch ("CaP" Plugin): Computes global representations using aggregation functions (e.g., global average pooling) and outputs a probability of defect presence for the entire sample. This head is supervised by a standard binary cross-entropy loss, directly reflecting the ultimate QC pass/fail decision.
Both branches share low- and high-level features via the encoder, enforcing synergy: segmentation learning encourages spatial sensitivity, while sample-level classification forces the network to attend to even minor or low-contrast defects that may otherwise be overlooked due to pixel imbalance. The overall loss is a weighted sum of the segmentation and sample classification losses, ensuring that learning balances both localization and global defect detection.
3. Sample-Level Supervision and Gradient Modulation
Sample-level supervision is implemented through binary targets (defect vs. no defect) at the sample outlet. The effect is twofold:
- Feature Distribution Modulation: The sample classification loss reshapes the encoder’s representation space, promoting sensitivity to features associated with rare or sparse defect signals. Through backpropagation, this increases the likelihood that activations from small or low-contrast defect regions significantly influence overall network updates.
- Gradient-Level Boosting for Recall: Because each defective sample—no matter how minor the defect—must activate the corresponding class output, the classification head delivers robust gradients for samples at high risk of being missed by pixelwise losses. This continual encouragement for defect recall counteracts the dilution of rare positives in pixel-imbalanced settings, markedly improving the sensitivity to subtle defects.
- Complementing Segmentation Optimization: While pixel-level segmentation ensures geometrically faithful mask prediction, the sample-level branch reduces misses—helping to avoid the frequent pitfall in which small defects contribute negligibly to the pixelwise loss, resulting in their omission.
4. Evaluation Metrics: Decision-Linked Analysis
Classical pixel-centric metrics such as mIoU are poor proxies for per-sample decision reliability in high-imbalance regimes; for example, a model can achieve strong mIoU by being correct on easy background regions and large defects while missing small, critical surface defects.
To address this, two new metrics are defined:
- Sample_mIoU: For each "relevant" sample (i.e., those with real or predicted defects), IoU is computed at the sample level: . Average is then taken only across defect-relevant samples, excluding true negatives to avoid inflation. This focuses model evaluation directly on completeness and correctness of localization for each QC decision.
- Seg_Accuracy and Seg_Recall: The segmentation mask is converted into a binary sample-level decision by applying a statistical function (for example, sum, mean, or max over mask probabilities). Defining for threshold , Seg_Accuracy combines correct positive and negative predictions: , while Seg_Recall isolates the fraction of truly defective samples correctly identified.
These metrics, tightly coupled to sample-level outcomes, eliminate bias from true negatives and provide decision stability feedback directly linked to both defect detection and localization completeness.
5. Empirical Findings and Performance
Experiments on industrial surface defect datasets (e.g., KolektorSDD2, Crack) indicate that:
- Sample_mIoU and Seg_Recall are markedly improved compared to conventional pixel-centric methods, reflecting that more defective samples are detected with complete mask coverage.
- Even segmentation architectures not originally designed for sample-level optimization benefit from the addition of the CaP branch, confirming the universality of the sample-centric MTL principle.
- Although classic mIoU may remain high in all settings, only the sample-centric metrics reveal the reduction in “hard misses” (completely undetected defects) that is critical for practical QC.
- Ablation studies confirm that dual-loss optimization does not degrade spatial mask fidelity, and in fact, enhances the model’s ability to localize under-represented or difficult defects.
6. Implications for Industrial Quality Control and Deployment
The sample-centric MTL paradigm has several direct implications:
- Alignment with Production Requirements: The framework’s supervision and evaluation are explicitly matched to the granularity at which decisions are made in production (i.e., part-level pass/fail), facilitating risk management for false negatives in safety- or reliability-critical settings.
- Generalizability to Other Sample-Centric Tasks: The dual-branch, metric-aligned design is directly adaptable to any inspection scenario where sample-level decisions are required with supporting localization (e.g., medical imagery, remote sensing, or other forms of quality assurance at the item level).
- Potential for Improved Trust and Robustness: By providing comprehensive sample-level evaluation, the method supplies actionable insights about the completeness and stability of localization decisions—features that are central to trustworthy deployment in high-stakes manufacturing or safety applications.
7. Summary of Key Techniques
| Technique | Role | Mathematical Formulation (where applicable) |
|---|---|---|
| Shared Encoder | Representation for both segmentation and classification tasks | -- |
| Segmentation Branch | Fine-grained pixel-level mask prediction | |
| Sample-Level Branch (CaP) | Binary pass/fail output for global decision; aids recall | |
| Sample_mIoU | Decision-linked localization metric, averaged per sample | , Average over relevant samples |
| Seg_Accuracy, Seg_Recall | Pass/fail correctness and recall at sample level |
The technical framework achieves a robust coupling between localization and sample-level detection, advancing reliability in high-imbalance, sparse-defect environments of industrial QC. By aligning loss functions, feature learning, and evaluation with the underlying operational requirements, it significantly enhances the practicality and effectiveness of automated defect inspection systems (Dong et al., 15 Oct 2025).