Automated Radiographic Sharp Scoring (ARTSS)

Updated 15 September 2025

ARTSS is an automated deep learning framework that computes the Total Sharp Score from hand X-rays, offering objective RA damage quantification.
It employs a multi-stage pipeline—featuring image reorientation with ResNet50, hand segmentation via UNet, and joint detection with YOLOv7—to achieve high accuracy with metrics such as IoU ≈ 0.94 and 99% precision.
ARTSS enhances clinical workflows by reducing inter-reader variability, providing interpretable attention maps, and expediting consistent assessments in both trials and routine practice.

Automated Radiographic Sharp Scoring (ARTSS) refers to a set of deep learning methodologies designed to quantify joint damage in rheumatoid arthritis (RA) using radiographic images, specifically automating the calculation of the Total Sharp/van der Heijde Score (TSS) from hand X-rays. ARTSS systematically addresses challenges of manual scoring—subjectivity, time-intensity, and inter- and intra-reader variability—by employing multi-stage pipelines incorporating state-of-the-art neural networks for image orientation, segmentation, joint identification, and score prediction.

1. Conceptual Overview

ARTSS frameworks operate on full-hand radiographs to automate RA damage assessment, particularly targeting the Sharp/van der Heijde scoring system, which is authoritative in clinical trials. The process consists of:

Image pre-processing and reorientation for input standardization,
Hand segmentation to isolate relevant anatomical regions,
Automated joint identification with modern object detection architectures,
Final prediction of the TSS using advanced regression or attention-based models.

A central goal is to produce objective, reproducible, and clinically relevant damage scores that align closely with consensus expert ratings and facilitate broader integration into routine and clinical research practice (Moradmand et al., 8 Sep 2025).

2. Multi-Stage Deep Learning Pipeline

The ARTSS methodology is exemplified by four sequential stages (Moradmand et al., 8 Sep 2025):

Stage	Deep Learning Architecture	Function
Re-orientation	ResNet50	Rotates image to standard pose
Segmentation	UNet.3	Segments hand from background
Joint Identification	YOLOv7	Locates anatomical joint ROIs
TSS Prediction	ViT, VGG16/19, ResNet50	Regresses radiographic TSS

Pre-processing and Standardization: Hand X-rays are resized, normalized, and rotated to ensure consistent spatial orientation.
Hand Segmentation: UNet.3 performs segmentation using denoising (Gaussian, wavelet), thresholding, and morphological operations to isolate the hand region, achieving an Intersection over Union ( $IoU$ ) of 0.94.
Joint Identification: YOLOv7 objects detector is trained to identify PI, PIP, MCP joints, and wrist, with a reported identification precision of 99%.
Score Prediction: Multiple architectures (including ViT, VGG16/19, DenseNet201, EfficientNetB0, and ResNet50) are benchmarked, with the Vision Transformer (ViT) exhibiting the lowest reported Huber loss (0.87), signifying robust score regression.

3. Algorithmic Innovations and Handling of Clinical Complexity

ARTSS frameworks integrate several algorithmic strategies to ensure clinical applicability and robustness:

Joint Disappearance Accommodation: ARTSS implements a padding strategy for patients with variable or missing joints; joint images are padded to the maximum observed dimension and masks prevent padded regions from influencing learning or outcomes (Moradmand et al., 8 Sep 2025).
Class Imbalance Mitigation: Ordinal score encoding transforms the multi-class regression task into interdependent binary classifiers, with under-sampling employed to counter extreme prevalence of the zero-score class in joint predictions (Tan et al., 2021).
Ordinal and Balanced Accuracy Metrics: To reflect clinical tolerance for near-miss predictions, models employ “±1 balanced accuracy,” counting as correct any predictions within one adjacent ordinal class, and calculate per-class accuracy to minimize dominance by more frequent classes (Tan et al., 2021).

4. Model Performance and Quantitative Evaluation

ARTSS models demonstrate metrics approaching and in some cases surpassing inter-reader consensus among experienced radiologists:

Segmentation: IoU $\approx$ 0.94.
Joint Detection: 99% accuracy; MAP calculated over all joint classes.
Score Regression: ViT achieves Huber loss of 0.87 on external testing (Moradmand et al., 8 Sep 2025); ensemble MIL models yield Pearson’s correlation coefficient (PCC) of 0.945 and RMSE as low as 15.57 (Bo et al., 8 Aug 2025).
Ground Truth: The mean of two experienced radiologist scores is used as the labeled reference, with cross-validation schemes (e.g., 3-fold) and external validation further ensuring generalizability.
Formulae: Standard metrics for evaluation are employed, including:

$IoU = \frac{|A \cap B|}{|A \cup B|}, \quad PCC = \frac{\mathrm{cov}(X,Y)}{\sqrt{\mathrm{var}(X)\,\mathrm{var}(Y)}}, \quad MAE = \frac{1}{n} \sum_{i} |x_i - y_i|, \quad RMSE = \sqrt{\frac{1}{n} \sum_{i} (x_i-y_i)^2}$

These enable objective comparison with radiologist ratings and robust benchmarking across datasets.

5. Workflow Features and Clinical Impact

ARTSS delivers substantial improvements in the efficiency, reliability, and interpretability of RA progression assessment:

Reduction in Subjectivity: Standardization and automation minimize inter- and intra-reader variability, as manual scoring is highly subjective and time-consuming (Moradmand et al., 8 Sep 2025).
Workflow Acceleration: Automated segmentation and detection sharply reduce time spent per assessment, notably for severe or late-stage cases marked by joint disappearance.
Decision Support: Embedded attention mechanisms (MIL frameworks, Grad-CAM) and anatomy-aware patch extraction deliver interpretable attention maps for clinician review, bolstering trust in automated scores (Bo et al., 8 Aug 2025).
Integration into Practice: Timesaving and consistency make ARTSS highly compatible with clinical trial workflows and routine monitoring, without sacrificing diagnostic performance.

6. Limitations and Prospective Directions

ARTSS models currently depend on high-quality annotated datasets and accurate joint localization; erroneous segmentation or detection may degrade scoring reliability. Handling of class imbalance is a major challenge, and further research is needed on data augmentation, strategic class rebalancing, and advanced attention mechanisms. Extension to multi-modal imaging, broader validation populations, and integration with other clinical metrics presents promising avenues for increasing both robustness and utility.

7. Relation to Ancillary Automated Radiographic Scoring Methods

ARTSS coexists within a rapidly developing ecosystem of radiograph scoring solutions, including multi-instance learning pipelines for interpretable image-level SvdH regression (Bo et al., 8 Aug 2025), direct regression CNNs utilizing transfer learning and ensemble stacking (Bo et al., 14 Jun 2024), and modular architectures inspired by joint detection and feature extraction regimes (Tan et al., 2021). Common features include utilization of ImageNet or large bone-age datasets for pre-training, adaptation of state-of-the-art architectures (ResNet, ViT, YOLOv3/YOLOv7, U-Net), and attention to interpretability through visual explanation methods such as Grad-CAM.

A plausible implication is that ongoing standardization of preprocessing, region extraction, and loss functions will accelerate clinical translation and cross-site reproducibility, with the ARTSS paradigm offering a robust template for automated RA damage quantification.