Segment Any Crack (SAC) Model

Updated 4 January 2026

The paper demonstrates that tuning only LayerNorm parameters on a SAM backbone yields state-of-the-art crack segmentation with approximately 0.05% trainable weights.
It employs efficient fine-tuning strategies that freeze most network weights, significantly reducing computational costs while maintaining high segmentation accuracy.
SAC shows superior zero-shot generalization across diverse infrastructural domains, making it practical for real-world deployment in resource-constrained settings.

The Segment Any Crack (SAC) model is a class of segmentation frameworks designed to adapt vision foundation models, particularly @@@@1@@@@ (SAM), for pixel-level automated crack detection in diverse civil infrastructure imagery. SAC leverages efficient fine-tuning strategies, enabling robust segmentation with minimal labeled data and significantly reduced computational resources. This approach achieves high accuracy and generalization, notably in zero-shot crack segmentation tasks—segmenting cracks on previously unseen materials, lighting conditions, and structural scenarios. Performance claims, methodological innovations, and computational analyses are based strictly on published metrics and empirical findings (Rostami et al., 19 Apr 2025).

1. Model Architecture and Fine-Tuning Paradigm

SAC is derived from SAM and utilizes its core Vision Transformer (ViT) encoder, prompt encoder, and mask decoder, with modification suited for binary crack segmentation:

Backbone: SAC retains the ViT-Base pre-trained on the SA-1B dataset (≈90 M parameters).
Segmentation Head: The original prompt-dependent mask decoder is replaced with a standard binary segmentation head, eliminating prompts and enabling direct segmentation outputs for crack predictions.
Selective Parameter Tuning: Crucially, SAC freezes all SAM weights except the affine parameters (gain $\gamma$ and bias $\beta$ ) of every LayerNorm layer in both encoder and decoder. This targets normalization components for adaptation—addressing covariate shifts between domains with dramatically fewer trainable parameters.

The adaptation strategy exploits the role of normalization in domain generalization: only tuning normalization statistics (LayerNorm) can recalibrate deep feature distributions for new domains without altering representational kernels, as established in transfer learning studies. The resulting parameter set for SAC consists of approximately 41,000 trainable weights—approximately 0.05% of SAM (Rostami et al., 19 Apr 2025).

Mathematical Formulation

Let $\theta = (\theta_{\text{frozen}}, \theta_{\text{norm}})$ , where $\theta_{\text{norm}} = \{\gamma_\ell, \beta_\ell\}$ for each LayerNorm $\ell$ .

For input $x$ in LayerNorm $\ell$ , output is

$\hat{x} = \frac{x - \mu_\ell}{\sqrt{\sigma_\ell^2 + \epsilon}},\quad y = \gamma_\ell \hat{x} + \beta_\ell$

Only $\gamma_\ell$ and $\beta_\ell$ are updated during adaptation.

2. Training Protocol and Loss Functions

Datasets

OmniCrack30k: 22,158 training, 13,277 validation, 4,582 test images from 20 crack image subdomains spanning concrete, asphalt, masonry, and metal.
Zero-shot Sets: Road420 (420 images), Facade390 (390 images), Concrete3k (3,000 images)—all annotated and resized for crack segmentation (Rostami et al., 19 Apr 2025).

Optimization

Optimizer: AdamW, weight decay $5 \times 10^{-5}$ , batch size 2.
Learning Rate: $5 \times 10^{-4}$ , cosine decay scheduler.
Loss Function: Hybrid of binary cross-entropy (BCE) and Dice loss:

$L_{\text{total}} = L_{\text{BCE}} + \lambda L_{\text{Dice}},\quad \lambda = 0.65$

where

$L_{\text{BCE}} = -\sum_{i=1}^N [g_i \log(p_i) + (1-g_i) \log(1-p_i)]$

$L_{\text{Dice}} = 1 - \frac{2 \sum_{i=1}^N p_i g_i}{\sum_{i=1}^N p_i^2 + \sum_{i=1}^N g_i^2}$

Epochs: 4 (as chosen for hyperparameter search and main training, matching protocol from empirical study).
Implementation: All non-normalization SAM weights are strictly held constant throughout training.

3. Evaluation Protocol and Quantitative Results

Metrics

Crack segmentation is evaluated by pixel-level metrics:

Precision: $\frac{\text{TP}}{\text{TP} + \text{FP}}$
Recall: $\frac{\text{TP}}{\text{TP} + \text{FN}}$
F1-Score: $2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}$
IoU: $\frac{\text{TP}}{\text{TP} + \text{FP} + \text{FN}}$ TP, FP, FN denote pixelwise true positive, false positive, and false negative counts (crack vs background).

Performance Benchmarks

SAC on OmniCrack30k

F1-Score: 61.22 %
IoU: 44.13 %

Efficiency Comparison (ViT-Base backbone)

Tuning Method	# Tunables	% of Backbone	F1 (%)	IoU (%)	Time (min/it)
No fine-tuning	0	0%	13.0	17.0	–
Decoder only	3.7 M	4.17%	57.97	40.83	7.9
PEFT (LoRA, r=8)	30.7 K	0.034%	57.95	40.81	9.9
Ge et al. (PEFT+dec)	4.0 M	4.51%	56.90	39.79	14.8
LayerNorm tuning	41 K	0.046%	61.22	44.13	12.3

Cross-Architecture Norm Tuning Comparison

Model	Full-Tune F1/IoU	# Tunables	Norm-Tune F1/IoU	# Tunables
SegFormer (MiT-B0)	59.98/42.85	3.7M	52.82/35.91	7.6K
U-Net	54.28/37.27	32.5M	54.82/37.77	55K
DeepLabv3+ (Res50)	55.27/38.21	42M	52.93/36.01	57K
DeepLabv3+ (Res101)	56.52/39.41	61M	54.09/37.09	110K
SAC (SAM + LN tuning)	61.22/44.13	41K	—	—

Zero-Shot Generalization

Dataset	SAC F1	SAC IoU	DeepLabv3+ Res101 F1/IoU
Road420	64.22	47.30	62.56 / 46.28
Facade390	61.74	44.68	62.56 / 46.28
Concrete3k	75.63	60.82	62.56 / 46.28
Mean ± SD	67.20 ± 6.05	50.93 ± 7.07	62.56 ± 9.60 / 46.28 ± 10.73

SAC displays the lowest variance across zero-shot tasks, indicating robustness and superior generalization.

4. Computational Efficiency and Generalization Analysis

Parameter footprint: SAC tunes 41 K parameters ( $\approx$ 0.046 % of SAM), while LoRA/Adapter methods require 3.7–4 M, and full decoder tuning necessitates 3.7 M–61 M. This results in 30–50 % reduction in training time per epoch compared to non-selective adaptation.
Generalization: SAC achieves the highest cross-domain mean F1 and the lowest standard deviation compared to all benchmarks. This suggests effective suppression of overfitting and superior capacity to segment cracks in unseen environments.
Efficiency implication: Selective normalization tuning delivers substantial speedup and memory savings, making SAC feasible for deployment in resource-constrained settings.

5. Key Methodological Innovations and Comparison with Prior Art

SAC’s distinguishing methodological characteristic is its use of LayerNorm-only fine-tuning for domain adaptation of SAM:

Full fine-tuning, LoRA, or Adapter-based PEFT approaches tune considerably larger parameter subsets but do not outperform SAC’s normalization-only approach on large-scale and zero-shot benchmarks.
SAC surpasses traditional segmentation networks (U-Net, DeepLabv3+, SegFormer) both in segmentation accuracy and computational cost for crack detection.
Empirical ablation confirms that updating normalization statistics suffices to bridge domain gap and yield state-of-the-art crack segmentation (Rostami et al., 19 Apr 2025).

6. Practical Impact and Deployment Contexts

SAC’s minimal computational requirements enable rapid retraining and deployment on real-world monitoring platforms where latency, energy and hardware constraints prohibit large-model fine-tuning.
The model’s robustness in zero-shot tasks is demonstrated on distinct domains including asphalt, masonry, metal, and concrete.
A plausible implication is that normalization-based adaptation strategies are especially suitable for industrial computer vision, where rapid prototyping and adaptation across diverse imaging domains is required.

7. Limitations and Future Directions

The empirical results focus on ViT-Base; extending norm-tuning to larger backbone variants or non-Transformer architectures may require further validation.
SAC does not alter feature representation kernels, which may limit adaptation in extreme domain shifts where structural features of cracks deviate significantly from those seen in pre-training.
Fine-tuning normalization layers as a standalone strategy may benefit from integration with knowledge distillation or hybrid PEFT approaches for cases where further accuracy or interpretability is needed.

Summary Table: SAC Performance Comparison

Model	# Tunables	F1 (%)	IoU (%)	Zero-Shot Mean F1	Zero-Shot std(F1)
SAC (LayerNorm Tuning)	41K	61.22	44.13	67.20	6.05
DeepLabv3+ Res101	61M	56.52	39.41	62.56	9.60
SegFormer (MiT-B0)	3.7M	59.98	42.85	52.82	—

This table demonstrates the parameter efficiency and generalization superiority of SAC relative to full and partial fine-tuning approaches in the published literature (Rostami et al., 19 Apr 2025).

PDF Markdown Chat (Pro)

References (1)

Segment Any Crack: Deep Semantic Segmentation Adaptation for Crack Detection (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Segment Any Crack (SAC) Model.

Segment Any Crack (SAC) Model

1. Model Architecture and Fine-Tuning Paradigm

Mathematical Formulation

2. Training Protocol and Loss Functions

Datasets

Optimization

3. Evaluation Protocol and Quantitative Results

Metrics

Performance Benchmarks

SAC on OmniCrack30k

Efficiency Comparison (ViT-Base backbone)

Cross-Architecture Norm Tuning Comparison

Zero-Shot Generalization

4. Computational Efficiency and Generalization Analysis

5. Key Methodological Innovations and Comparison with Prior Art

6. Practical Impact and Deployment Contexts

7. Limitations and Future Directions

Summary Table: SAC Performance Comparison

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Segment Any Crack (SAC) Model

1. Model Architecture and Fine-Tuning Paradigm

Mathematical Formulation

2. Training Protocol and Loss Functions

Datasets

Optimization

3. Evaluation Protocol and Quantitative Results

Metrics

Performance Benchmarks

SAC on OmniCrack30k

Efficiency Comparison (ViT-Base backbone)

Cross-Architecture Norm Tuning Comparison

Zero-Shot Generalization

4. Computational Efficiency and Generalization Analysis

5. Key Methodological Innovations and Comparison with Prior Art

6. Practical Impact and Deployment Contexts

7. Limitations and Future Directions

Summary Table: SAC Performance Comparison

Sponsor

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research