EGAD: Entropy-Guided Sampling for Medical Segmentation
- EGAD is a two-stage active learning framework that uses predictive entropy and diversity metrics to select informative samples for segmentation tasks.
- The framework computes agreement via cosine similarity and diversity using mutual information to identify underrepresented regions, enhancing model generalization.
- EGAD leverages dual-model co-training and consistency-based semi-supervised learning to boost segmentation performance in medical imaging with minimal annotations.
Entropy-Guided Agreement-Diversity (EGAD) is a two-stage active learning framework that integrates predictive-entropy-based uncertainty sampling with sample diversity and agreement metrics for the targeted selection of data in semi-supervised learning (SSL) regimes. Primarily developed for label-efficient segmentation tasks in medical imaging, such as fetal head localization in ultrasound, EGAD aims to maximize model generalization with minimal annotation effort by strategically augmenting labeled datasets through an entropy-guided, agreement-diversity-driven query process (Wang et al., 24 Jan 2026).
1. Formal Problem Setting and Motivation
EGAD operates in the context of pool-based active learning for pixel-wise segmentation. Given an image pool with a small initial labeled subset and a large unlabeled remainder , the objective is to iteratively improve a segmentation model by selectively annotating batches from , subject to a strict annotation budget . The segmentation problem is cast as binary pixel classification (foreground fetal head vs. background). Standard random or entropy-only sampling in SSL often leads to redundancies in labeled data and overfitting on less informative samples, motivating a more principled, diversity-aware acquisition protocol (Wang et al., 24 Jan 2026).
2. Two-Stage Active Learning Sampler
2.1 Predictive-Entropy Based Uncertainty Filtering
In the first stage, EGAD quantifies uncertainty by computing the total predictive entropy for each candidate image under the current model: where is the softmax probability at pixel for class . The images in 0 with the highest third of 1 values are retained as the candidate pool 2 for further screening (Wang et al., 24 Jan 2026).
2.2 Agreement-Diversity Scoring
A refined selection uses agreement-diversity criteria to encourage label diversity and minimize redundancy:
- Agreement (Cosine Similarity): For each 3, compute the average cosine similarity between its deep feature representation 4 and all labeled samples,
5
- Diversity (Empirical Mutual Information): For raw pixel intensities,
6
where 7 is the joint histogram.
Both 8 and 9 are normalized to 0 across 1. The final composite score is: 2 Images with high score (low agreement and high diversity) are selected for annotation, ensuring the labeled set captures underrepresented regions of input space (Wang et al., 24 Jan 2026).
3. Consistency-Based Semi-Supervised Learning with Feature Downsampling
EGAD incorporates a co-training paradigm with dual models—one CNN (UNeXt) and one transformer (Swin-Unet)—each leveraging both labeled and unlabeled samples. The SSL loss is: 3 Pseudo-labels from one network supervise the other on unlabeled data (cross pseudo-supervision). The feature consistency term 4 applies 5-normalized, downsampled deep features to emphasize foreground structure while suppressing background noise. Gaussian noise perturbation with 6 is used for data augmentation in the consistency terms (Wang et al., 24 Jan 2026).
4. EGAD Training Pipeline
The end-to-end pipeline operates as follows:
- Initialize 7 (5−10% randomly labeled), 8.
- For each round 9:
- Train both networks jointly using 0 on 1.
- Infer entropy and feature statistics over 2.
- Construct 3 (top third entropy).
- Compute agreement-diversity scores in 4, select batch 5.
- Label 6 via oracle and update 7, 8.
- Terminate at budget exhaustion, deploy final CNN (UNeXt) (Wang et al., 24 Jan 2026).
5. Empirical Evaluation and Results
EGAD was rigorously validated on HC18 and ES-TCB fetal head ultrasound datasets. For training label fractions as low as 5–10%, SSL-EGAD consistently outperformed random/entropy-only and previous semi-supervised methods:
| Method | Labeled (%) | HC18 Dice (%) | ES-TCB Dice (%) | HC18 HD | ES-TCB HD |
|---|---|---|---|---|---|
| Best prior SSL (LCMT) | 5 | 93.24 | 92.70 | 46.90 | 47.38 |
| Entropy (CEAL) | 5 | 94.07 | 94.50 | 41.13 | 48.82 |
| Random | 5 | 94.06 | 91.84 | 35.87 | 57.59 |
| SSL-EGAD (Ours) | 5 | 95.09 | 94.04 | 28.77 | 41.43 |
| Best prior SSL (LCMT) | 10 | 95.31 | 95.15 | 29.50 | 39.32 |
| SSL-EGAD (Ours) | 10 | 96.44 | 96.21 | 19.15 | 25.74 |
The method is robust across gestational stages, with performance gains persisting in second and third trimester test splits (Wang et al., 24 Jan 2026).
6. Entropy and Diversity Indices: Broader Context
Broader research establishes the link between ensemble diversity (including structural or predictive diversity) and generalization. Shannon entropy, Simpson’s index, and the Berger–Parker index provide orthogonal mechanisms to quantify diversity:
- Shannon entropy measures uncertainty as 9, normalized by 0.
- Simpson’s index defines diversity as 1.
- Berger–Parker focuses on class dominance: 2.
Empirical studies indicate that ensembles or labeled sets with higher normalized Shannon or Simpson diversity yield superior accuracy, whereas extreme evenness (maximum diversity) may induce slight declines, and dominance-based (Berger–Parker) indices are weak predictors. For EGAD frameworks, using normalized entropy or Simpson indices with agreement terms (e.g., cosine similarity) facilitates a principled tradeoff between uncertainty-driven informativeness and diversity-driven generalizability (0810.3525).
7. Limitations and Prospective Extensions
EGAD is validated exclusively for 2D fetal head segmentation, and the diversity score's mutual-information term leverages a fixed downsampling operator. Statistical significance of performance gains is not formally reported. Prospective improvements include extending to more anatomical structures (e.g., cerebellum, femur), 3D ultrasound, adopting advanced diversity selectors (e.g., sub-modular optimization or Bayesian entropy), and exploring adaptive budget allocation throughout active learning cycles (Wang et al., 24 Jan 2026).