Hybrid Semi-Supervised Learning Approach

Updated 24 November 2025

The paper introduces a hybrid semi-supervised framework that leverages a Teacher model to generate pseudo-labels, thereby reducing manual annotation workload.
It employs a multi-task learning architecture with a ResNet-18 backbone to simultaneously predict global image quality and specific defect details.
The approach improves interpretability and clinical actionability while statistically outperforming single-task baselines in experimental evaluations.

A hybrid semi-supervised learning approach in the context of retinal image quality assessment (RIQA) refers to a training strategy that integrates both manually labeled data and automatically generated pseudo-labels within a unified multi-task deep learning framework. This methodology seeks to overcome the cost and scalability bottleneck imposed by exhaustive expert annotation of detailed capture defects, while improving both overall image quality classification and the interpretability of model predictions (Telesco et al., 17 Nov 2025).

1. Problem Formulation and Motivation

Hybrid semi-supervised approaches address key limitations in RIQA: classic methods typically classify only overall image quality (“good,” “usable,” “rejectable”) without identifying specific acquisition defects (illumination, clarity, contrast) necessary to guide real-time recapture during ophthalmic screening. Fully supervised multi-label training on such details is prohibitively expensive, given the manual workload. The hybrid scheme introduces a two-pronged paradigm: (a) supervised learning on global quality labels for the primary dataset (e.g., EyeQ), and (b) semi-supervised, multi-task learning using pseudo-labels for acquisition defects, generated by an auxiliary “Teacher” model previously trained on a modest, detail-labeled subset (e.g., MSHF) (Telesco et al., 17 Nov 2025).

This paradigm enables models to simultaneously predict global and detailed quality scores, yielding outputs that are more clinically actionable, increasing interpretability without increasing manual annotation cost.

2. Architectural and Algorithmic Framework

The hybrid semi-supervised approach comprises two main stages:

Stage 1: Teacher Model for Detailed Quality Defects.

A separate network, typically based on ResNet-18, is trained on a small, detail-labeled dataset (“Teacher dataset” 𝒮). The objective is to learn a mapping $f_A(x; \theta_S^A)$ from the image $x$ to a vector of binary detail labels (illumination, clarity, contrast). Loss is binary cross-entropy:

$L_\mathrm{BCE} = -\frac{1}{n}\sum_{i=1}^n\sum_{j=1}^k \left[ y_{i,j}^A \log f_{A,j}(x_i) + (1-y_{i,j}^A)\log(1 - f_{A,j}(x_i)) \right],$

where $k=3$ is the number of defect types (Telesco et al., 17 Nov 2025).

Stage 2: Multi-Task Student Model with Pseudo-Label Guidance.

A main “Student” model uses a shared feature encoder (ImageNet-pretrained ResNet-18) and then branches into two heads: (1) a softmax classifier for overall image quality, and (2) a sigmoid-based multi-label predictor for details. After pre-training on the main dataset for overall quality only, the Student is fine-tuned using both real overall-quality labels and pseudo-labels for details (obtained by applying the Teacher to all images), i.e.,

$L_{\mathrm{total}} = \lambda_A L_\mathrm{BCE}(\hat{\mathbf{y}}^A, f_A) + \lambda_B L_\mathrm{CE}(y^B, f_B),$

where $L_\mathrm{CE}$ is multiclass cross-entropy for overall quality, and $\lambda_A$ , $\lambda_B$ control the loss weighting (Telesco et al., 17 Nov 2025).

No explicit pseudo-label selection, consistency losses, or adversarial terms are included. Pseudo-labels are taken as soft probabilities and directly drive joint training.

3. Datasets, Annotation Strategies, and Preprocessing

The hybrid semi-supervised approach leverages heterogeneously labeled imaging corpora:

Primary-Set (Overall Quality):

Large-scale datasets such as EyeQ (n=28,792, three-class: “good,” “usable,” “reject”) or DeepDRiD (n=2,000, two-class) with comprehensive overall-quality labels (Telesco et al., 17 Nov 2025).

Detail-Set (Defect Annotations):

A “Teacher” dataset like MSHF (n=802), labeled with three binary indicators (illumination, clarity, contrast) for each image.

Pseudo-Labeling Mechanism:

The Teacher model, pre-trained on MSHF, outputs detail estimates for every primary-set image, thus synthesizing weak annotations at scale.

Preprocessing includes field-of-view cropping to remove black backgrounds, resizing, and brightness normalization as in state-of-the-art RIQA pipelines. Data augmentation in Student fine-tuning is restricted to mild affine/geometric changes to preserve label semantics.

4. Empirical Performance and Interpretability

The hybrid semi-supervised multi-task framework has been validated on EyeQ, DeepDRiD, and a new detail-labeled EyeQ subset (EyeQ-D):

Dataset/Task	Baseline F1	Hybrid Semi-Supervised F1	Best Prior State-of-the-Art
EyeQ (overall)	0.863 ST	0.875 MT	Fu et al.: 0.855 (MCFNet)
DeepDRiD	0.763 ST	0.778 MT	QuickQual: 0.767

Key findings:

The hybrid model statistically significantly outperforms single-task baselines (Wilcoxon p<0.05).
On detail defect tasks, multi-task scores are comparable to Teacher and within inter-expert variability bounds.
Interpretability is enhanced: For a “rejectable” image, the model may output “illumination=bad, clarity=bad,” offering explicit recapture guidance; Grad-CAM visualizations become more defect-localized (Telesco et al., 17 Nov 2025).

This suggests that pseudo-label noise does not degrade performance and may, in fact, mirror the natural inter-rater uncertainty in expert annotations.

5. Theoretical and Practical Implications

The hybrid semi-supervised strategy delivers several advances:

Clinical Actionability: By attributing overall failure modes to explicit capture defects, the model supports targeted operator interventions (e.g., adjust exposure for low illumination, refocus for low clarity).
Cost-Effectiveness: Manual detail annotation is only required for a small corpus; the remainder is handled by propagation of weak pseudo-labels.
Robustness to Noisy Labels: Empirically, Student networks trained on pseudo-labels match the Teacher’s performance and track expert disagreement, supporting the intrinsic statistical robustness of the method (Telesco et al., 17 Nov 2025).
Integration with Existing Pipelines: The method is architecturally lightweight (ResNet-18 backbone), efficient, and compatible with standard RIQA infrastructure.

A plausible implication is that expanding the variety and quality of Teacher datasets, as well as exploring regression-style global-quality heads or alternative feature backbones, could further enhance model discrimination and interpretability—mirroring trends in recent fundus IQA research (Zun et al., 25 Jun 2025, Gong et al., 19 Nov 2024).

6. Limitations and Future Directions

The approach relies on the representational adequacy of the Teacher and the fidelity of pseudo-label propagation. Limitations include:

Constrained defect granularity: Only three defect types (illumination, clarity, contrast) are currently modeled; other sub-qualities (e.g., field coverage, artifact severity) are not included.
Limited Teacher dataset size may bottleneck pseudo-label quality.
The method does not utilize pseudo-label confidence reweighting or sample selection, which could further improve learning from noisy annotation.
The studied architectures are restricted to ResNet-18 scale; scaling to larger networks, introducing regression-based heads, or adopting multi-task learning with more granular outputs remains underexplored.

Future work may involve increasing the diversity of manually detailed datasets, experimenting with self-supervised pretraining for better generalizability, integrating continuous quality scales as in FundaQ-8 (Zun et al., 25 Jun 2025), and formally extending the multi-task schema to encompass additional clinically motivated quality attributes.

Key Reference: