H-SemiS: Hierarchical Fusion of Semi and Self-Supervised Learning for Knee Osteoarthritis Severity Grading

Published 25 Apr 2026 in cs.CV | (2604.23335v1)

Abstract: Knee osteoarthritis (KOA) is a degenerative joint disease that can lead to chronic pain, reduced mobility, and long-term disability. Automated severity grading from knee radiographs can support early assessment, but current methods heavily depend on large labeled datasets and remain sensitive to class imbalance, noisy samples, and variability in clinical annotations. To alleviate these limitations, we propose a Hierarchical fusion of Semi-Supervised framework with Self-Supervision (H-SemiS) for KOA severity grading in knee X-ray samples using limited annotated data. Rather than treating severity grading as a flat multi-class problem, H-SemiS decomposes the task into a sequence of binary sub-tasks within a semi-supervised teacher-student architecture, directly mitigating the impact of class imbalance. To further enhance feature learning from unlabeled data, the framework integrates an adversarial self-supervised reconstruction module that encourages the network to capture robust anatomical structures. In parallel, a teacher-student design with quantum-inspired feature mixing improves discrimination boundaries between adjacent grades when pseudo-labels are noisy. We comprehensively evaluate H-SemiS on two challenging multi-class datasets and assess its generalizability on two binary-class datasets. Our experimental results demonstrate the superiority of the proposed H-SemiS framework across multiple evaluation metrics, consistently outperforming several competing baselines and state-of-the-art methods. The code is publicly available at https://github.com/chandravardhan-singh-raghaw/H-SemiS.

Abstract PDF Upgrade to Chat

Authors (5)

Summary

The paper introduces a hierarchical framework combining self-supervision, proxy labeling, and quantum-infused teacher-student models for KOA severity grading.
It achieves superior accuracy (84.9%-86.8%) and effectively mitigates class imbalance despite using only 20% labeled data.
Results demonstrate robust feature discrimination and efficient computational performance, highlighting its potential for clinical deployment.

Hierarchical Fusion of Semi- and Self-Supervised Learning for KOA Severity Grading: An Expert Analysis

Background and Motivation

Knee osteoarthritis (KOA) is a prevalent degenerative disease necessitating early and robust diagnostic support, especially via imaging modalities such as knee X-rays (KXR). Automated KOA severity grading—crucial for clinical management—faces persistent challenges from data scarcity, class imbalance across severity grades (Kellgren-Lawrence, KL 0–4), variability in annotation, and the inherent low contrast and noise in KXR images.

Traditional supervised models, though effective given large labeled datasets, scale poorly in clinical contexts due to annotation costs and susceptibility to minority-class underrepresentation. Existing self-supervised and semi-supervised approaches alleviate label scarcity but remain limited by suboptimal feature discrimination, noisy pseudo-labels, and inadequate handling of imbalanced multi-class settings.

The paper “H-SemiS: Hierarchical Fusion of Semi and Self-Supervised Learning for Knee Osteoarthritis Severity Grading” (2604.23335) addresses these limitations by integrating adversarial-inspired self-supervision, proxy-label consistency, and quantum-infused learning into a hierarchical classification framework. This synthesis targets increased annotation efficiency, structural feature robustness, and enhanced discrimination of KOA grading categories.

Figure 1: Diagnostic challenges in KXR analysis for KOA are highlighted from (a) to (f), including blurred pixels, indistinct joint space narrowing (JSN), low contrast, varying illumination, overlapping anatomical structures, and inconsistencies in scan composition (single or dual joint).

Framework Overview: H-SemiS Architecture and Pipeline

The H-SemiS framework comprises three coupled stages: self-supervised masked image reconstruction, similarity-driven proxy labeling, and a hierarchical, quantum-enhanced semi-supervised classifier.

Stage 1 – Masked Image Reconstruction (MI-Rec): Unlabeled KXR images are partitioned into patches; a high masking ratio is applied (optimally 75%), forcing the adversarially trained model to reconstruct masked areas via an encoder-decoder generator and a patch-level discriminator. This process, inspired by both Masked Autoencoders and GANs, surfaces robust anatomical structures and enhances sample diversity.
Figure 2: Gaps in existing research (top) and the proposed H-SemiS solutions (bottom), especially semi-supervision with as little as 20% labeled samples and hierarchical multi-class decomposition.

Figure 3: H-SemiS consists of MI-Rec for self-supervised reconstruction, SiRL for intelligent proxy labeling, and hierarchical quantum-infused teacher-student classifiers for KOA grading.
Stage 2 – Similarity-aware Reconstructed Image Labeler (SiRL): MI-Rec outputs are assigned pseudo-labels using a class-template matching strategy. Each KL grade is represented by median feature vectors derived via Wide Residual Networks. Samples are labeled via a combined cosine and Euclidean similarity threshold. Only samples above the optimal threshold (τ = 0.80) are propagated, minimizing the risk from label noise.
Figure 4: The MI-Rec process: random masking, reconstruction by a generator, and patchwise real/fake discrimination to produce anatomically plausible reconstructions.

Figure 5: SiRL matches reconstructed samples to feature templates for robust proxy labeling, filtering ambiguous reconstructions.
Stage 3 – Hierarchical Quantum-infused Teacher-Student Model (HQ-TeSt): Multi-class KOA grading is decomposed, via a dual rule-based protocol, into hierarchically organized binary tasks. At each node, a teacher-student model is trained, where the student’s weights are EMA-updated into the teacher; both classical and quantum convolutional networks are leveraged. The quantum module processes normalized feature vectors, mapping them into a low-width amplitude-encoded quantum state and applying parameterized unitary transformations to capture non-linear, entangled relationships in feature space. Aggregation at tree leaves yields final predictions through a depth-weighted scheme.
Figure 6: HQ-TeSt fuses classical and quantum convolutional pipelines, utilizing teacher-student EMA updates for consistency under weak/strong augmentations and pseudo-labeling.

Figure 7: The Quantum Convolutional Network consists of a quantum encoder, parametrized ansatz layers, and measurement-based decoding to enrich feature transformations.

Figure 8: Hierarchical Multi-classifier (HiM) decomposes multi-class grading into binary nodes, injects reconstructed proxy-labeled samples, and aggregates decisions bottom-up.

Experimental Protocol and Dataset Analysis

The study targets benchmarking on the OAI and DKXI multi-class datasets (KL0–KL4 annotations) and the OP/KO binary datasets, employing aggressive data preprocessing (ROI localization, CLAHE enhancement, keypoint-based JSN preservation) and augmentation. Only 20% of training samples are labeled; the remainder is treated as unlabeled.

Figure 9: X-ray dataset samples across KL grades and binary classes, illustrating annotation variety and class color-coding.

Standard metrics—accuracy, macro-averaged precision/recall/F1—are reported; ablations cover masking ratios, reconstruction algorithms, SiRL thresholds, quantum/classical ablations, and decomposition strategies.

Results: Quantitative and Qualitative Insights

Quantitative Performance

On both OAI and DKXI, H-SemiS consistently outperforms thirteen supervised, self-supervised, and semi-supervised baselines across all key metrics, even under only 20% labeled data. Notably, it achieves accuracy improvements well above 2–5% absolute over Mean-Teacher and the best recent GAN/SSL models.
Precision/recall tradeoffs remain well managed, and F1-scores demonstrate robustness to class imbalance and annotation sparsity.
Statistical testing against the null hypothesis yields $p < 10^{-10}$ , ruling out random-chance explanations.

Numerical highlights: On OAI, H-SemiS achieves 84.9% accuracy and 86.1% precision; on DKXI, 86.8% accuracy and 85.3% precision. These are the highest among all tested models and settings.

Ablation and Component Analysis

MI-Rec: Highest grading accuracy is observed at a 75% masking ratio; performance declines with higher or lower ratios. MI-Rec outperforms MAE, CMAE, ViT-AE++, and GAN-MAE in reconstructing fine anatomical details.
SiRL: Proxy labeling at τ=0.80 yields the best tradeoff between added sample diversity and label noise. SSIM confirms proxy labels preserve crucial KOA-relevant structures.
Quantum Convolutions: Introduction of quantum convolutional layers (QCN) and $L_2$ -tanh normalization facilitates superior clustering (t-SNE shows this explicitly) and improves discriminative power across imbalanced and limited data splits.
Hierarchical Decomposition: Outperforms random and flat decompositions by at least 4.5% in accuracy, especially when integrating proxy samples at minority branches.
Figure 10: Qualitative localization overlays with high ADCC scores; H-SemiS highlights clinically salient JSN and osteophyte regions, supporting interpretability.

Figure 11: Example original/masked/reconstructed KXR across KL grades; MI-Rec with 75% masking best retains anatomical details.

Figure 12: SSIM comparisons between original/reconstructed samples; τ=0.80 maintains high similarity and structure.

Robustness and Generalizability

Cross-dataset generalization: Models trained on one (e.g., OAI) generalize robustly to others (e.g., DKXI and concatenated OAI+DKXI), with measured drops reflecting plausible anatomical and scanning protocol domain shifts.
Binary/Multiclass transfer: Similar resilience is observed in OP/KO binary settings and when evaluating aggregated datasets.

Figure 13: ROC Curve for OP dataset highlights the strong discrimination achieved by H-SemiS on binary KOA classification.

Computational Efficiency

Despite its high modeling capacity (115.89M parameters), H-SemiS achieves among the lowest inference times (<4.09 ms), outperforming more cumbersome self-supervised or supervised SOTA models in both efficiency and grading accuracy.

Theoretical Implications and Future Directions

H-SemiS demonstrates that judicious fusion of self-supervised proxy generation, similarity-aware label selection, and quantum-inspired discriminative modeling can overcome persistent bottlenecks in class-imbalanced and limited-annotation medical imaging. The quantum circuit approach is particularly notable: it exposes higher-order, nonlinear interactions in anatomical features that elude classical CNNs, as confirmed by improved within-class clustering.

Strong claims supported by results:

H-SemiS matches or exceeds the accuracy of fully supervised approaches while using only 20% labeled data.
Proxy label filtering (SiRL, τ=0.80) effectively constrains noise while maximizing sample efficiency.
Quantum-infused kernels provide a marked boost in fine-discrimination, with t-SNE visualizations showing compact and well-separated embeddings.

Practical deployment potential is immediate, particularly in annotation-scarce settings, but extending the approach to volumetric (3D) modalities and further optimizing quantum-classical hybridization for hardware would make the framework even more compelling.

Conclusion

H-SemiS establishes a new benchmark for annotation-efficient, robust, and interpretable KOA grading from radiographic images. Its hierarchical, semi/self-supervised pipeline—culminating in quantum-enhanced classification—demonstrates empirically superior accuracy, macro-metric balance, domain generalization, and computational scalability, even under severe data scarcity and imbalance.

This paradigm anticipates future directions in medical AI: hybrid learning systems that synthesize self-supervision, proxy labeling, and quantum information processing to bridge fundamental annotation and class bias gaps in clinical-grade image analysis.

Reference:

“H-SemiS: Hierarchical Fusion of Semi and Self-Supervised Learning for Knee Osteoarthritis Severity Grading” (2604.23335)

Markdown Report Issue