WP-CrackNet: Weakly Supervised Crack Detection
- The paper introduces WP-CrackNet, a collaborative adversarial learning framework for weakly supervised road crack detection that minimizes reliance on pixel-level annotations.
- It integrates a classifier, reconstructor, and detector enhanced by path-aware attention and center-enhanced CAM consistency to achieve competitive segmentation performance.
- The framework attains high IoU and precise boundary localization on benchmark datasets, demonstrating scalability and cost efficiency for road monitoring.
WP-CrackNet is a collaborative adversarial learning framework designed for end-to-end weakly-supervised road crack detection, with the explicit goal of minimizing reliance on pixel-level annotations by enabling pixel-wise crack segmentation from only image-level class labels. Developed to address the prohibitive costs and scalability limitations associated with dense manual labeling, WP-CrackNet integrates adversarial training, multi-branch attention, and center-enhanced consistency to achieve competitive results vis-à-vis fully supervised methods, and it is validated on multiple benchmark datasets using code and data available at https://mias.group/WP-CrackNet/ (Ma et al., 20 Oct 2025).
1. Collaborative Framework Architecture
WP-CrackNet’s architecture is comprised of three tightly coupled components:
- Classifier: Trained with image-level labels, this component computes class activation maps (CAMs), which localize discriminative regions corresponding to cracks.
- Reconstructor: Structured as a UNet-style encoder-decoder, it measures feature inferability by splitting fused encoder features into “crack” and “road” components using the classifier’s CAM. The reconstructor then attempts to reconstruct the input image (or its corresponding parts) from these separated features.
- Detector: This branch uses a fused feature representation constructed from intermediate features of both the classifier and reconstructor. It incorporates a path-aware attention module (PAAM) to merge high-level semantics with low-level details and outputs dense pixel-wise predictions. Training of the detector is supervised by pseudo labels derived from post-processed crack CAMs, refined with dense Conditional Random Fields (denseCRF).
This collaborative framework enables mutual feedback between the components, where information produced by each module dynamically influences the others, forming a closed loop that iteratively enhances the coverage and quality of crack localization.
2. Alternating Adversarial Learning and Pseudo Label Generation
WP-CrackNet innovates by introducing an alternating adversarial training process between the classifier and the reconstructor:
- When the classifier’s parameters are fixed, the reconstructor is updated to enhance reconstruction performance on the separated crack and road features.
- When the reconstructor is fixed, the classifier’s objective becomes to produce CAMs that maximize the reconstruction error on road (non-crack) regions, effectively encouraging the CAMs to expand their coverage to the full extent of crack areas.
Formally, the feature map Z is split using the CAM M_C: where denotes element-wise multiplication. The reconstruction and classification objectives are alternately minimized, with the overall loss combining binary cross-entropy for the classifier and a negative reconstruction matching loss to promote adversarial behavior.
Pseudo labels for the detector branch are generated by post-processing the classifier’s CAMs using denseCRF, propagating boundary information and enforcing spatial consistency, thereby providing supervision without ground-truth segmentation masks.
3. Path-Aware Attention Module (PAAM)
The PAAM is deployed within the detector to effectively fuse the rich, but semantically coarse, information from the classifier with the low-level, spatially precise details highlighted by the reconstructor. PAAM encompasses two attention mechanisms:
- Spatial Attention: Applies directional convolutions (with ∈ {0°, 45°, 90°, 135°}) to emphasize crack patterns along meaningful orientations. The outputs are aggregated into a spatial attention map via a sigmoid activation, resulting in spatially enhanced features:
- Channel Attention: Aggregates global information through average pooling, followed by a two-layer fully connected subnetwork (with ReLU and sigmoid activations) to generate channel-wise weights. This mechanism adaptively recalibrates feature channels critical for crack discrimination.
The concatenation and attentive fusion guarantee that both context and local structure are jointly leveraged in the final segmentation.
4. Center-Enhanced CAM Consistency (CECCM)
To address CAM spatial drift and approximate ground-truth crack region centrality, WP-CrackNet introduces the CECCM:
- Center Gaussian Weighting: The CAM’s spatial centroid is computed:
A Gaussian mask highlights central regions.
- Consistency Constraint: A center-enhanced consistency loss enforces alignment between the classifier’s and reconstructor’s center-weighted CAMs, typically employing an L1 penalty normalized to image area.
This mechanism directs the network’s focus to regions most likely representing the crack core, improving the reliability of the auto-generated pseudo labels and enhancing segmentation mask precision.
5. Performance Evaluation and Empirical Results
WP-CrackNet is benchmarked against both weakly supervised and fully supervised crack detection frameworks on multiple datasets, including custom image-level annotated datasets (derived from Crack500, DeepCrack, and CFD). Experimental findings include:
- WP-CrackNet achieves intersection-over-union (IoU), F1-score, precision, and recall metrics comparable to state-of-the-art fully supervised methods, despite relying exclusively on image-level supervision.
- Integration of the CECCM and PAAM modules leads to substantial improvements in IoU and detailed boundary localization (as demonstrated in ablation studies).
- The framework significantly outperforms previous weakly supervised methods and demonstrates clear advantages over purely unsupervised anomaly detection approaches such as UP-CrackNet, especially in the delineation of fine crack boundaries.
A summary of the comparative results is presented below:
| Method | Supervision Type | IoU (%) | Notable Modules |
|---|---|---|---|
| WP-CrackNet | Image-level (weak) | High (close to supervised) | CECCM, PAAM |
| Strongly-Sup. | Pixel-level (full) | Higher (but comparable) | - |
| Best prior weak | Image-level or weaker | Lower | CAM/post-processing |
| UP-CrackNet | No crack labels | Lower | GAN, error map |
This level of performance with minimal supervision highlights the method’s scalability and economic viability for automated large-scale road monitoring.
6. Dataset Construction, Source Code, and Research Impact
The framework’s utility is reinforced by the release of three image-level annotated datasets, constructed by pseudo-labeling and filtering crack images from widely used segmentation benchmarks, yielding domain-relevant data suited for weakly supervised learning paradigms.
The complete WP-CrackNet source code package and associated datasets are available at https://mias.group/WP-CrackNet/, supporting transparent evaluation and further research by the academic community.
WP-CrackNet’s introduction of adversarial classifier–reconstructor collaboration, explicit path-aware attention, and center-focused CAM regularization marks a substantial methodological advancement for weakly supervised semantic segmentation in the context of infrastructure distress analysis. Its integration mechanisms and public resource release are poised to accelerate progress in scalable, annotation-efficient computer vision for intelligent road inspection and asset management (Ma et al., 20 Oct 2025).