Papers
Topics
Authors
Recent
2000 character limit reached

Rail-5k Dataset Benchmark

Updated 21 November 2025
  • Rail-5k dataset is a comprehensive benchmark for visual rail defect detection using both annotated and uncurated images from diverse railway environments.
  • It supports fully-supervised and semi-supervised learning paradigms with rigorous annotation protocols for object detection and semantic segmentation tasks.
  • Key challenges include long-tailed defect distributions, fine-grained class distinctions, and robustness to real-world image corruptions.

The Rail-5k dataset is a large-scale benchmark designed for visual rail surface defect detection under real-world conditions. It encompasses a comprehensive collection of annotated and uncurated imagery sampled from diverse railway scenarios throughout China, addressing key practical challenges such as fine-grained classification, long-tailed defect distribution, and robustness to real-world image corruptions. The dataset’s design supports both fully-supervised and semi-supervised learning paradigms and establishes rigorous annotation and evaluation protocols for both object detection and semantic segmentation tasks (Zhang et al., 2021).

1. Collection and Imaging Protocol

Rail-5k consists of approximately 5,000 high-resolution RGB images captured in various operational railway environments, including tunnels, bridges, and both straight and curved tracks. Of these, 1,100 images bear expert-provided defect annotations; the remaining 4,000 images are unlabeled and intentionally uncurated, containing real-world corruptions such as motion blur, uneven lighting, and foreign objects.

Imaging hardware was mounted on inspection cars, with cameras positioned about 200 mm above the rail surface and oriented vertically downward. To ensure high annotation quality, frames exhibiting strong shadows or over-exposure were excluded. Labeled images are standardized to a resolution of 3648 × 2736 pixels, while the unlabeled subset was subjected only to minimal preprocessing, specifically removal of over-exposed and shadowed frames, with no resizing or color normalization imposed.

2. Expert Annotation and Defect Taxonomy

Annotation was performed by a cohort of 10 railway experts employing a multi-stage review, in which each image received evaluation by at least three experts to resolve ambiguities and enforce label consensus (no quantitative inter-annotator agreement coefficient is reported). The dataset defines a taxonomy of 13 distinct rail-related defect classes, with defect definitions drawn from railway standards:

  1. Running Surface – large, clear region on the rail head
  2. Contact Band – polished subregion beneath wheel contact
  3. Dark Contact Band – similar to contact band but darker
  4. Spalling – small, chip-like missing material (“stripped dent”)
  5. Crack – thin, diffuse fissures (annotated via mask)
  6. Corrugation – wavy, periodic wear, labeled along valleys
  7. Grinding – post-maintenance stripe patterns
  8. Fastener – broad clip connecting rail to sleeper
  9. Spike Screw – large fastening screw
  10. Set Screw – small adjustment screw
  11. Indentation – small, distinct dents
  12. Burning – localized thermal discoloration
  13. Welded Joint – flash from weld seam

The annotation protocol tailors the bounding paradigm to the semantic and morphological properties of each class, as summarized in the table:

Size Boundary Example Classes Annotation Type
Large clear Rail Surface, Fastener Rectangle box
Large obscure Corrugation Valley boundary
Small clear Spalling, Indentation Tight box
Diffuse sharp (Crack only) Crack Mask/union boxes

Cracks, too thin for effective bounding-box annotation, are labeled by pixel-wise segmentation masks. For other fine or ambiguous cases, dense box unions are applied. Tools are not explicitly specified; common tools include LabelImg and class-specific custom mask editors.

3. Data Partitioning and Benchmarking Settings

The labeled portion of Rail-5k is divided for two principal research settings:

  • Fully-supervised: The 1,100 labeled images are randomly split into training (≈880 images; 80%) and testing (≈220 images; 20%). No prescribed validation split, but users may allocate 10% of training images for hyper-parameter tuning.
  • Semi-supervised: The same 220-image test set is retained. All remaining labeled data and the full set of 4,000 unlabeled images are used for joint semi-supervised training. The unlabeled subset introduces domain shifts and unknown corruptions and is left without any manual annotation, providing a challenging scenario for robustness evaluation.

4. Statistical Distribution and Major Challenges

The dataset exhibits an extreme long-tailed distribution of classes. Let ncn_c denote the bounding-box count for class cc. The imbalance ratio is

maxcncminc:nc>0nc40.98\frac{\max_c n_c}{\min_{c:\,n_c>0} n_c} \approx 40.98

with spalling being the most populous class and welded joint the least. Table: class statistics.

Class #Boxes (ncn_c) #Images
Spalling 12,582 1,005
Crack (mask) 3,785 375
Corrugation 3,349 445
Contact Band 1,093 1,087
Running Surface 1,082 1,080
Dark Contact Band 773 769
Fastener 757 582
Spike Screw 502 424
Set Screw 414 360
Indentation 307 216
Grinding 337 179
Burning 41 10
Welded Joint 14 8

The fine-grain nature (e.g., distinguishing contact band from dark contact band), variable spatial scale, and the inherent difficulties in region annotation for diffuse defects such as cracks contribute to the complexity. The uncurated, unlabeled images further introduce background domain shifts and object appearance corruptions.

5. Evaluation Metrics and Protocols

Performance on Rail-5k is evaluated by class-agnostic and class-specific object detection and segmentation metrics:

  • Object Detection:
    • AP@[0.5]: average precision at intersection-over-union (IoU) threshold 0.5,
    • COCO-style mAP@[0.5:0.95]\mathrm{mAP}@[0.5:0.95]: mean AP over multiple IoU thresholds,
    • A detection is a true positive if IoU \geq 0.5 with a ground-truth box of the same class.

AP per class cc is computed as:

APc=01pc(r)dr\mathrm{AP}_c = \int_0^1 p_c(r)\,dr

and mAP as:

mAP=1Cc=1CAPc\mathrm{mAP} = \frac{1}{|C|} \sum_{c=1}^{|C|} \mathrm{AP}_c

  • Segmentation (Crack):
    • Evaluated using intersection-over-union (IoU) for each class:

IoU=predgtpredgt\mathrm{IoU} = \frac{|\mathrm{pred}\cap\mathrm{gt}|}{|\mathrm{pred}\cup\mathrm{gt}|}

6. Baseline Results and Algorithmic Benchmarks

Representative baselines were established using YOLOv5-s for detection and DeepLabv3+ with a ResNet-50 backbone for segmentation:

  • Detection (YOLOv5-s):
    • Pre-trained on MS-COCO, trained for 300 epochs with standard hyperparameters, employing mosaic augmentation and GIoU loss.
    • Detection performance varied widely by class; e.g., [email protected] is 98.9% for rail surface, 94.5% for contact band, 60.0% for spalling, and 24.0% for grinding. Indentation yielded 0 AP, and crack is not reported as detection but as segmentation.
  • Segmentation (DeepLabv3+R-50):
    • Trained for 9,000 iterations (batch size 16, SGD, momentum 0.01, weight decay 1e-4).
    • Achieved IoU of 98.9% (background) and 67.8% (crack).
  • Semi-supervised Detection (Pseudo-labeling):
    • Unlabeled data labeled by model inference and filtered by confidence threshold sthrs_\text{thr}, followed by joint fine-tuning for 1 epoch at learning rate 4×1044 \times 10^{-4}.
    • [email protected] for various sthrs_\text{thr}:
    • sthr=0.6s_\text{thr}=0.6: 63.29
    • sthr=0.7s_\text{thr}=0.7: 63.27
    • sthr=0.8s_\text{thr}=0.8: 62.43
    • sthr=0.9s_\text{thr}=0.9: 61.55
    • Notably, performance drops under semi-supervised regime, evidence of significant domain gap and label noise in the unlabeled set (Zhang et al., 2021).

7. Open Problems and Prospective Extensions

Challenges highlighted by Rail-5k include reliably handling rare and fine-grained defect classes, achieving robust performance under domain shift, and overcoming the limitations of bounding-box detectors for ill-defined or diffuse regions such as cracks. Indentation defects in particular yield negligible AP using standard detection models.

Suggested directions for advancing performance on Rail-5k encompass class-balanced sampling or focal loss variants for long-tail distributions, domain adaptation strategies or corruption augmentations to bridge labeled-unlabeled domain gaps, and robust semi-supervised methods such as consistency regularization and noise-aware pseudo-label filtering. The dataset creators plan future extensions incorporating multi-modal data sources (e.g., 3D scan, eddy-current data), which may help further boost reliability in operational rail inspection settings (Zhang et al., 2021).

Rail-5k provides a valuable resource for benchmarking visual algorithms under stringent, practically relevant conditions, supporting research in object detection, fine-grained defect classification, segmentation, and semi-supervised learning robustness.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Rail-5k Dataset.