SW-YOLO: Semi-Automated Labeling for Railroads

Updated 15 December 2025

The paper presents a bootstrapped workflow that leverages human seed labels to iteratively refine YOLOv8, achieving [email protected] improvements of 6–8%.
SW-YOLO is a semi-automated labeling approach that integrates deep learning inference with human correction to enhance fault detection scalability.
Iterative retraining reduces labeling time per image from 2–4 minutes to as low as 30 seconds, offering a cost-effective solution for railroad infrastructure monitoring.

The SW-YOLO approach is a semi-automated labeling and iterative training methodology designed to reduce manual effort and improve fault detection in large-scale image and video datasets, specifically targeting railroad infrastructure monitoring. Centered around the YOLOv8 object detection architecture, SW-YOLO leverages a bootstrapped workflow: human annotators create a small set of seed labels, which are then used to iteratively train and refine the model, reducing subsequent manual correction requirements while improving model accuracy. Its quantitative and procedural advantages over fully manual pipelines and commercial automated labeling tools position it as an efficient and cost-effective alternative for detection-focused machine learning applications (Lester et al., 1 Apr 2025).

1. End-to-End Pipeline and Workflow

SW-YOLO employs a cyclic workflow, closing the loop between deep learning inference and human expertise. The primary steps are as follows:

Initial Manual Labeling: Annotators select a small subset (e.g., 100 frames) of the available data, drawing bounding boxes around fault classes such as "insufficient ballast" and "plant overgrowth." Labels adhere to the YOLO format, using normalized coordinates for class ID, box center, and box size.
YOLOv8 Base Training: The seed dataset is used to train an off-the-shelf YOLOv8 (small configuration) model to convergence, with standard augmentations (horizontal flip, rotations ±15°, shear ±10°).
Batch Inference and Correction: The current model is deployed to predict labels on the next unlabeled batch (e.g., 100 images), generating annotation files in under five seconds per batch. Raw model outputs are exported, mapped to human-readable class names, and loaded into a labeling GUI (e.g., LabelImg), where human annotators correct prediction errors. Correction per image is reduced to 30 seconds to 2 minutes, compared to 2–4 minutes for manual labeling.
Iterative Retraining: Corrected labels are merged into the training set. YOLOv8 is retrained or fine-tuned on the expanded set (initial + assisted labels). This process repeats until the entire corpus (e.g., 400+ images) is exhaustively and accurately labeled.

A summary of major workflow steps is provided below:

Step	Data Used	Human Role
Initial labeling	100 seed images	Draw boxes on faults
Batch inference	Next 100-image batch	Review/edit model output
Retraining	All labeled data to date	Initiate next train cycle
Iteration	Repeat inference → correction → train	Oversee quality, corrections

2. YOLOv8 Architectural Details

SW-YOLO utilizes the canonical YOLOv8s (small) architecture without structural modification. Its components are:

Backbone: CSPDarknet-like “P5” cascade of convolutional blocks with cross-stage partial connections and Spatial Pyramid Pooling (SPP).
Neck: PANet feature pyramid, facilitating fusion of high- and low-level features across stride levels 8, 16, and 32.
Head: Per-scale prediction heads outputting location $(x, y, w, h)$ , objectness probability $P_{\text{obj}}$ , and $K$ class logits.

Improvements derive from iterative expansion of the annotated dataset, not from changes to the core YOLOv8 architecture.

3. Loss Functions and Evaluation Metrics

The SW-YOLO training regime relies on three key mathematical formulations:

Bounding-Box Regression (CIoU loss):

$\mathcal{L}_{\rm box} = 1 - \mathrm{IoU}(b,b^*) + \frac{\rho^2(\mathbf{c},\mathbf{c}^*)}{c_{\rm diag}^2} + \alpha v$

where $b, b^*$ denote predicted and ground-truth boxes; $\mathbf{c}, \mathbf{c}^*$ are their centers; $c_{\rm diag}$ is the diagonal of the smallest enclosing box; $v$ penalizes aspect-ratio mismatch; $\alpha$ determines penalty strength.

Objectness and Classification (binary cross-entropy):

$\mathcal{L}_{\rm obj} = -\sum_{i,j} [ y_{ij}^{\rm obj} \log p_{ij}^{\rm obj} + (1-y_{ij}^{\rm obj}) \log(1-p_{ij}^{\rm obj}) ]$

$\mathcal{L}_{\rm cls} = -\sum_{i,j} \sum_{c=1}^K y_{ij}^c \log p_{ij}^c$

where $y_{ij}^{\rm obj} \in \{0,1\}$ signals presence of an object in cell $(i,j)$ ; $y_{ij}^c$ is the one-hot class label.

Total Loss:

$\mathcal{L} = \lambda_{\rm box} \mathcal{L}_{\rm box} + \lambda_{\rm obj} \mathcal{L}_{\rm obj} + \lambda_{\rm cls} \mathcal{L}_{\rm cls}$

Evaluation Metrics:

Precision $P = \frac{\rm TP}{\rm TP + FP}$ , Recall $R = \frac{\rm TP}{\rm TP + FN}$
$F_1$ -score: $F_1 = 2 \frac{P R}{P + R}$
[email protected]: $\mathrm{mAP} = \frac{1}{K} \sum_{c=1}^K \int_0^1 P_c(R) dR$

SW-YOLO monitors [email protected] on a 10% validation split with early stopping (patience = 10 epochs). Typical cycle lengths increase with set size (50 epochs for 100 images, 75 for 200, 100 for ≥300), ceasing if mAP fails to improve by >0.5% across 10 epochs.

4. Label Export, Correction, and Re-Ingestion

SW-YOLO integrates label export and correction to maximize annotation throughput:

Inference Output: YOLOv8 writes label .txt files per image.
Editable Conversion: A Python script converts labels into the desired annotation-tool format and applies the class label map (e.g., 0→"ballast_low", 1→"overgrowth").
GUI Correction: Files are loaded into a tool like LabelImg for user adjustments.
Merging: Corrected files (in YOLO format) are restored to the training set.
Retraining: The growing corpus (manual + corrected labels) is auto-ingested by YOLOv8 for subsequent training cycles.

This process allows efficient scaling of the labeled dataset with minimal friction between annotation, correction, and model improvement.

5. Quantitative Performance Gains

Empirical evaluation demonstrates substantial reductions in human labor and increases in detection quality:

Labeling time per image: 2–4 minutes (manual) versus 30 seconds–2 minutes (SW-YOLO pipeline).
400-image corpus: 10 hours manual versus 4–5 hours using SW-YOLO.
F₁-score progression:

| Image Set | F₁-score | |----------------------------|----------| | 100 manual | 0.81 | | 200 (100 manual + 100 SW) | 0.84 | | 300 (100 + 200 SW) | 0.86 | | 400 (100 + 300 SW) | 0.87 | | 400 manual only | 0.71 |

[email protected] improved by +6–8% over the full-manual baseline, paralleling F₁-score advances.

6. Limitations and Generalization Principles

Several constraints and conditions for effective deployment were noted:

Seed Label Quality: The accuracy of the initial manual annotations critically determines downstream model and labeling performance; errors in the seed set propagate.
Subjectivity in Classes: Categories such as “insufficient ballast” are subjective, benefitting from clear, written guidelines to ensure consistency.
Confidence Thresholding: Dynamically adjusting the model’s confidence threshold (e.g., only presenting boxes with $P_{\rm obj} < 0.6$ for human review) may further cut correction time as the model improves.
Domain Portability: The SW-YOLO loop is applicable beyond railroads. Generalization involves replacing label maps and seed images, tuning augmentations to new domains, and adjusting learning rates for dataset scale.
Suggested Enhancements: Future work includes semi-supervised consistency losses (for pseudo-label weighting), leveraging YOLOv8’s auto-anchor recomputation, and integrating active learning to focus manual review on highly uncertain samples.

A plausible implication is that the SW-YOLO methodology, due to its model-agnostic iteration and efficient human-in-the-loop processes, is extendable to other object-detection tasks beyond its demonstrated use in railroad video analytics (Lester et al., 1 Apr 2025).

PDF Markdown Chat (Pro)

References (1)

A YOLO-Based Semi-Automated Labeling Approach to Improve Fault Detection Efficiency in Railroad Videos (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to SW-YOLO Approach.