Papers
Topics
Authors
Recent
2000 character limit reached

OmniCrack30k Dataset for Crack Segmentation

Updated 2 December 2025
  • OmniCrack30k is a diverse, multi-domain benchmark dataset designed for training and evaluating deep learning models in crack segmentation across various civil structures.
  • It comprises 30,000 high-resolution, manually annotated images capturing different crack morphologies, materials, and imaging conditions to ensure broad applicability.
  • Experimental protocols on this dataset reveal performance variations among segmentation models and highlight challenges such as class imbalance and out-of-distribution generalization.

OmniCrack30k is a large-scale, multi-domain benchmark dataset specifically developed for semantic segmentation of cracks in civil infrastructure, encompassing diverse materials, surface types, and imaging conditions. It has been established as a crucial resource for training, evaluating, and benchmarking deep learning algorithms for crack detection and segmentation in both laboratory and real-world settings (Ranieri et al., 25 Nov 2025, Rostami et al., 19 Apr 2025).

1. Dataset Composition and Structure

OmniCrack30k comprises 30,000 single-channel RGB images, totaling approximately 9 billion annotated pixels. Images are sourced from over 20 publicly available crack segmentation collections, inter alia, BCL, Crack500, CrackTree260, Ceramic, CSSC, GAPS, Khanh11k variants, Masonry, TopoDS, and UAV75 (Ranieri et al., 25 Nov 2025, Rostami et al., 19 Apr 2025). Surfaces include concrete, asphalt, ceramic, masonry, and steel, with scene contexts spanning roads, tunnels, walls, pavements, and indoor building infrastructure. Crack morphologies range from fine hairline fissures to large “alligator” patterns, incorporating straight, branched, and complex crack networks. The images capture a broad spectrum of lighting conditions (including daylight and shadow), background textures (smooth, rough, patterned), and perspectives (close-up to aerial UAV). The original resolution of the images varies between 81 × 116 and 4608 × 4608 pixels, but for experimental consistency, images are typically resized (e.g., to 256 × 256 for some pipelines, or aligned with network encoder requirements at 270×270, 384×384, 512×512, or 540×540 pixels) (Ranieri et al., 25 Nov 2025, Rostami et al., 19 Apr 2025).

2. Annotation Methodology and Label Schema

OmniCrack30k employs pixel-wise binary annotation, with each image accompanied by a rasterized binary mask—foreground pixels (1) delineate cracks, background pixels (0) correspond to non-crack regions. Masks are produced through manual annotation by domain experts overlaying the crack traces on RGB originals; prompt-based or semi-automatic annotation pipelines are not reported. All annotations target binary classification: cracks versus background. There is no additional subdivision of annotated cracks by width, severity, or subclass, nor are multi-category (multi-class) definitions applied (Ranieri et al., 25 Nov 2025, Rostami et al., 19 Apr 2025). Quality control and inter-annotator consensus standards are not explicitly documented, but, by assumption of peer-review, the masks meet internal benchmark quality thresholds.

3. Data Splits, Evaluation Subsets, and Preprocessing

Dataset splits are standardized, employing train/validation/test partitions provided by the benchmark creators. A representative example allocates 22,158 images for training, 13,277 for validation, and 4,582 for held-out testing (Rostami et al., 19 Apr 2025). Partitioning is randomized or stratified across sub-datasets to maximize distributional diversity in each split. Additionally, three zero-shot out-of-distribution evaluation sets—Road420 (420 images, 448 × 448 px), Facade390 (390 images, 448 × 448 px), and Concrete3k (3,000 images, 500 × 500 px)—enable generalization measurement without fine-tuning (Rostami et al., 19 Apr 2025). Images and masks are commonly resized to a consistent resolution (e.g., 256 × 256 px) prior to training; model-specific resizing is adopted according to encoder input requirements (Ranieri et al., 25 Nov 2025).

4. Experimental Protocols and Model Architectures

The primary use case for OmniCrack30k is benchmarking semantic segmentation architectures under diverse augmentation and training regimes (Ranieri et al., 25 Nov 2025, Rostami et al., 19 Apr 2025). Augmentation strategies applied with the Albumentations library include:

  • Geometric (p = 0.25): HorizontalFlip, RandomRotate90, Transpose, ShiftScaleRotate
  • Distortion (p = 0.10): Blur, ElasticTransform, GridDistortion, OpticalDistortion
  • Photometric (p = 0.10): HueSaturationValue, CLAHE (Ranieri et al., 25 Nov 2025)

All segmentation models studied share a standard U-Net decoder, but are differentiated by encoder backbones such as ResNet-50, ResNet-101, ConvNeXt V2 Base, and ConvNeXt V2 Huge, with parameter counts and input resolutions adjusted accordingly (25.6M at 270 px, 44.5M at 540 px, 88.7M at 384 px, 660M at 512 px respectively) (Ranieri et al., 25 Nov 2025).

Selective fine-tuning protocols, such as adapting only normalization layers in the Segment Anything Model (SAM) to yield the Segment Any Crack (SAC) model, are also evaluated on OmniCrack30k for increased generalization and computational efficiency (Rostami et al., 19 Apr 2025).

5. Quantitative Metrics and Benchmark Performance

Evaluation on OmniCrack30k emphasizes per-pixel semantic segmentation metrics. Key formulations include:

  • Mean Intersection-over-Union (mIoU):

mIoU=1Cc=1CTPcTPc+FPc+FNc\mathrm{mIoU} = \frac{1}{C} \sum_{c=1}^C \frac{\mathrm{TP}_c}{\mathrm{TP}_c + \mathrm{FP}_c + \mathrm{FN}_c}

  • Dice Coefficient:

Dice=2TP2TP+FP+FN\mathrm{Dice} = \frac{2\,\mathrm{TP}}{2\,\mathrm{TP} + \mathrm{FP} + \mathrm{FN}}

  • Jaccard Index:

J=TPTP+FP+FNJ = \frac{\mathrm{TP}}{\mathrm{TP} + \mathrm{FP} + \mathrm{FN}}

Where TP,FP,FN\mathrm{TP}, \mathrm{FP}, \mathrm{FN} denote true positive, false positive, and false negative crack pixels (Ranieri et al., 25 Nov 2025, Rostami et al., 19 Apr 2025).

Baseline performance highlights include:

Model Tuned Params F1-score (%) IoU (%)
Segment Any Crack (SAC, SAM + LayerNorm) 41K (0.05%) 61.22 44.13
SegFormer (full) 3.7M (100%) 59.98 42.85
DeepLabv3+-ResNet101 (full) 61M (100%) 56.52 39.41
U-Net (full) 54.28 37.27

Within the U-Net architecture family, ConvNeXt V2 Huge yields the highest mIoU (0.666), Dice (0.865), and Jaccard (0.786) on the test split under no augmentation; data augmentation moderately affects scores (Ranieri et al., 25 Nov 2025). Statistical tests and confidence intervals are not reported in these studies.

6. Qualitative Analysis and Generalization Characteristics

Qualitative evaluation extends to out-of-distribution (OOD) domains—such as cultural heritage (CH) statues and monuments not encountered during training. U-Net models with ConvNeXt V2 Huge encoders produce the sharpest, most continuous crack delineations with minimal false positives on OOD sculpture imagery. In contrast, ResNet-based encoders often under-segment thin cracks and misclassify textured non-crack regions as cracks. ConvNeXt V2 Base shows moderate performance but introduces false positives on visually complex backgrounds such as dark marble (Ranieri et al., 25 Nov 2025).

Failure analysis indicates that smaller models (e.g., ResNet-50) under-segment hairline cracks, while larger ConvNeXt variants may slightly over-segment in regions of extremely low contrast.

7. Dataset Limitations and Research Directions

OmniCrack30k is primarily assembled from civil infrastructure imagery; crack segmentation for statues, monuments, or other cultural heritage surfaces is not represented in its training data (Ranieri et al., 25 Nov 2025). The binary mask labeling schema precludes subclassification by crack width or severity, and crack pixel sparsity (≈1–5%) introduces a pronounced class imbalance challenge. Annotation methodology and inter-annotator agreement require further clarification and documentation for future releases (Ranieri et al., 25 Nov 2025, Rostami et al., 19 Apr 2025). A plausible implication is that developing pixel-imbalance-aware loss functions and assembling CH-specific, multi-category benchmarks are critical future steps.

The dataset’s standardization, multi-domain scope, and inclusion of three OOD zero-shot test sets make it the current reference for comparative benchmarking of deep learning-based crack detection algorithms in diverse material and scene contexts (Rostami et al., 19 Apr 2025, Ranieri et al., 25 Nov 2025).

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to OmniCrack30k Dataset.