Papers
Topics
Authors
Recent
2000 character limit reached

Segment Any Crack (SAC) Model

Updated 4 January 2026
  • The paper demonstrates that tuning only LayerNorm parameters on a SAM backbone yields state-of-the-art crack segmentation with approximately 0.05% trainable weights.
  • It employs efficient fine-tuning strategies that freeze most network weights, significantly reducing computational costs while maintaining high segmentation accuracy.
  • SAC shows superior zero-shot generalization across diverse infrastructural domains, making it practical for real-world deployment in resource-constrained settings.

The Segment Any Crack (SAC) model is a class of segmentation frameworks designed to adapt vision foundation models, particularly @@@@1@@@@ (SAM), for pixel-level automated crack detection in diverse civil infrastructure imagery. SAC leverages efficient fine-tuning strategies, enabling robust segmentation with minimal labeled data and significantly reduced computational resources. This approach achieves high accuracy and generalization, notably in zero-shot crack segmentation tasks—segmenting cracks on previously unseen materials, lighting conditions, and structural scenarios. Performance claims, methodological innovations, and computational analyses are based strictly on published metrics and empirical findings (Rostami et al., 19 Apr 2025).

1. Model Architecture and Fine-Tuning Paradigm

SAC is derived from SAM and utilizes its core Vision Transformer (ViT) encoder, prompt encoder, and mask decoder, with modification suited for binary crack segmentation:

  • Backbone: SAC retains the ViT-Base pre-trained on the SA-1B dataset (≈90 M parameters).
  • Segmentation Head: The original prompt-dependent mask decoder is replaced with a standard binary segmentation head, eliminating prompts and enabling direct segmentation outputs for crack predictions.
  • Selective Parameter Tuning: Crucially, SAC freezes all SAM weights except the affine parameters (gain γ\gamma and bias β\beta) of every LayerNorm layer in both encoder and decoder. This targets normalization components for adaptation—addressing covariate shifts between domains with dramatically fewer trainable parameters.

The adaptation strategy exploits the role of normalization in domain generalization: only tuning normalization statistics (LayerNorm) can recalibrate deep feature distributions for new domains without altering representational kernels, as established in transfer learning studies. The resulting parameter set for SAC consists of approximately 41,000 trainable weights—approximately 0.05% of SAM (Rostami et al., 19 Apr 2025).

Mathematical Formulation

Let θ=(θfrozen,θnorm)\theta = (\theta_{\text{frozen}}, \theta_{\text{norm}}), where θnorm={γ,β}\theta_{\text{norm}} = \{\gamma_\ell, \beta_\ell\} for each LayerNorm \ell.

For input xx in LayerNorm \ell, output is

x^=xμσ2+ϵ,y=γx^+β\hat{x} = \frac{x - \mu_\ell}{\sqrt{\sigma_\ell^2 + \epsilon}},\quad y = \gamma_\ell \hat{x} + \beta_\ell

Only γ\gamma_\ell and β\beta_\ell are updated during adaptation.

2. Training Protocol and Loss Functions

Datasets

  • OmniCrack30k: 22,158 training, 13,277 validation, 4,582 test images from 20 crack image subdomains spanning concrete, asphalt, masonry, and metal.
  • Zero-shot Sets: Road420 (420 images), Facade390 (390 images), Concrete3k (3,000 images)—all annotated and resized for crack segmentation (Rostami et al., 19 Apr 2025).

Optimization

  • Optimizer: AdamW, weight decay 5×1055 \times 10^{-5}, batch size 2.
  • Learning Rate: 5×1045 \times 10^{-4}, cosine decay scheduler.
  • Loss Function: Hybrid of binary cross-entropy (BCE) and Dice loss:

Ltotal=LBCE+λLDice,λ=0.65L_{\text{total}} = L_{\text{BCE}} + \lambda L_{\text{Dice}},\quad \lambda = 0.65

where

LBCE=i=1N[gilog(pi)+(1gi)log(1pi)]L_{\text{BCE}} = -\sum_{i=1}^N [g_i \log(p_i) + (1-g_i) \log(1-p_i)]

LDice=12i=1Npigii=1Npi2+i=1Ngi2L_{\text{Dice}} = 1 - \frac{2 \sum_{i=1}^N p_i g_i}{\sum_{i=1}^N p_i^2 + \sum_{i=1}^N g_i^2}

  • Epochs: 4 (as chosen for hyperparameter search and main training, matching protocol from empirical study).
  • Implementation: All non-normalization SAM weights are strictly held constant throughout training.

3. Evaluation Protocol and Quantitative Results

Metrics

Crack segmentation is evaluated by pixel-level metrics:

  • Precision: TPTP+FP\frac{\text{TP}}{\text{TP} + \text{FP}}
  • Recall: TPTP+FN\frac{\text{TP}}{\text{TP} + \text{FN}}
  • F1-Score: 2PrecisionRecallPrecision+Recall2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}
  • IoU: TPTP+FP+FN\frac{\text{TP}}{\text{TP} + \text{FP} + \text{FN}} TP, FP, FN denote pixelwise true positive, false positive, and false negative counts (crack vs background).

Performance Benchmarks

SAC on OmniCrack30k

  • F1-Score: 61.22 %
  • IoU: 44.13 %

Efficiency Comparison (ViT-Base backbone)

Tuning Method # Tunables % of Backbone F1 (%) IoU (%) Time (min/it)
No fine-tuning 0 0% 13.0 17.0
Decoder only 3.7 M 4.17% 57.97 40.83 7.9
PEFT (LoRA, r=8) 30.7 K 0.034% 57.95 40.81 9.9
Ge et al. (PEFT+dec) 4.0 M 4.51% 56.90 39.79 14.8
LayerNorm tuning 41 K 0.046% 61.22 44.13 12.3

Cross-Architecture Norm Tuning Comparison

Model Full-Tune F1/IoU # Tunables Norm-Tune F1/IoU # Tunables
SegFormer (MiT-B0) 59.98/42.85 3.7M 52.82/35.91 7.6K
U-Net 54.28/37.27 32.5M 54.82/37.77 55K
DeepLabv3+ (Res50) 55.27/38.21 42M 52.93/36.01 57K
DeepLabv3+ (Res101) 56.52/39.41 61M 54.09/37.09 110K
SAC (SAM + LN tuning) 61.22/44.13 41K

Zero-Shot Generalization

Dataset SAC F1 SAC IoU DeepLabv3+ Res101 F1/IoU
Road420 64.22 47.30 62.56 / 46.28
Facade390 61.74 44.68 62.56 / 46.28
Concrete3k 75.63 60.82 62.56 / 46.28
Mean ± SD 67.20 ± 6.05 50.93 ± 7.07 62.56 ± 9.60 / 46.28 ± 10.73

SAC displays the lowest variance across zero-shot tasks, indicating robustness and superior generalization.

4. Computational Efficiency and Generalization Analysis

  • Parameter footprint: SAC tunes 41 K parameters (\approx 0.046 % of SAM), while LoRA/Adapter methods require 3.7–4 M, and full decoder tuning necessitates 3.7 M–61 M. This results in 30–50 % reduction in training time per epoch compared to non-selective adaptation.
  • Generalization: SAC achieves the highest cross-domain mean F1 and the lowest standard deviation compared to all benchmarks. This suggests effective suppression of overfitting and superior capacity to segment cracks in unseen environments.
  • Efficiency implication: Selective normalization tuning delivers substantial speedup and memory savings, making SAC feasible for deployment in resource-constrained settings.

5. Key Methodological Innovations and Comparison with Prior Art

SAC’s distinguishing methodological characteristic is its use of LayerNorm-only fine-tuning for domain adaptation of SAM:

  • Full fine-tuning, LoRA, or Adapter-based PEFT approaches tune considerably larger parameter subsets but do not outperform SAC’s normalization-only approach on large-scale and zero-shot benchmarks.
  • SAC surpasses traditional segmentation networks (U-Net, DeepLabv3+, SegFormer) both in segmentation accuracy and computational cost for crack detection.
  • Empirical ablation confirms that updating normalization statistics suffices to bridge domain gap and yield state-of-the-art crack segmentation (Rostami et al., 19 Apr 2025).

6. Practical Impact and Deployment Contexts

  • SAC’s minimal computational requirements enable rapid retraining and deployment on real-world monitoring platforms where latency, energy and hardware constraints prohibit large-model fine-tuning.
  • The model’s robustness in zero-shot tasks is demonstrated on distinct domains including asphalt, masonry, metal, and concrete.
  • A plausible implication is that normalization-based adaptation strategies are especially suitable for industrial computer vision, where rapid prototyping and adaptation across diverse imaging domains is required.

7. Limitations and Future Directions

  • The empirical results focus on ViT-Base; extending norm-tuning to larger backbone variants or non-Transformer architectures may require further validation.
  • SAC does not alter feature representation kernels, which may limit adaptation in extreme domain shifts where structural features of cracks deviate significantly from those seen in pre-training.
  • Fine-tuning normalization layers as a standalone strategy may benefit from integration with knowledge distillation or hybrid PEFT approaches for cases where further accuracy or interpretability is needed.

Summary Table: SAC Performance Comparison

Model # Tunables F1 (%) IoU (%) Zero-Shot Mean F1 Zero-Shot std(F1)
SAC (LayerNorm Tuning) 41K 61.22 44.13 67.20 6.05
DeepLabv3+ Res101 61M 56.52 39.41 62.56 9.60
SegFormer (MiT-B0) 3.7M 59.98 42.85 52.82

This table demonstrates the parameter efficiency and generalization superiority of SAC relative to full and partial fine-tuning approaches in the published literature (Rostami et al., 19 Apr 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Segment Any Crack (SAC) Model.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube