Rank-Aware Teacher Selecting (RATS)
- The paper shows that ranking-driven teacher selection reduces WMSE by up to 13% compared to alternative methods, greatly improving IHC cell counting accuracy.
- RATS employs a novel global-to-local patch ranking mechanism by mapping image patches to semantic count anchors to enforce monotonicity in predictions.
- The method integrates dynamic teacher assignment into the distillation loss, yielding superior quantitative performance and more reliable supervision for multi-model agglomeration.
Rank-Aware Teacher Selecting (RATS) is a sample-wise, unsupervised strategy for selecting teacher models during knowledge distillation, introduced to improve agglomeration of multiple foundation models for immunohistochemistry (IHC) image cell counting. RATS leverages global-to-local ranking consistency across image patches to assess individual teachers' counting competency, enabling adaptive selection of the most reliable teacher per batch in the distillation process. This approach directly addresses the heterogeneity and complexity inherent in multi-class cell counting tasks within IHC images by providing robust, task-aware supervision based on semantic consistency in predicted instance counts (Huang et al., 16 Nov 2025).
1. Global-to-Local Patch Ranking
The core innovation in RATS is the definition of a global-to-local ranking mechanism over patch groups sampled from each input image. Each image is decomposed into a group of center-aligned patches of progressively increasing size, with the expectation that larger patches contain at least as many cells as their contained subregions.
For each teacher model , patch features are mapped onto a fixed set of discrete semantic "count anchors" , generated, for example, using CLIP-like vision-LLMs with natural language prompts encoding cell-count information. The similarity between image-patch features and count anchors is scored and converted by a softmax into a soft distribution over count bins:
where is a temperature parameter. The predicted patch count is the expectation:
The ranking constraint is enforced using a margin-ranking hinge loss penalizing violations of the monotonicity , for each teacher and patch group:
with margin . This loss measures how well each teacher preserves the expected cell-count ordering from local (smaller) to global (larger) spatial context.
2. Sample-Wise Teacher Selection Procedure
A distinguishing aspect of RATS is its adaptation of teacher selection at the batch level. For every batch, each candidate teacher receives the same patch groups derived from images. Each teacher's total ranking loss is computed over the batch, and the teacher with the lowest loss is selected as the supervisor for distillation in that iteration:
This mechanism ensures that exactly one teacher is selected per batch—specifically, the one whose prediction ordering most closely matches the ground-truth global-to-local cell-count ranking on that data subset.
3. Integration into the Distillation Loss
Following the dynamic teacher selection, the chosen teacher serves as the exclusive supervision signal for the student model . Classical feature-based distillation is performed on full images, aligning student representations with those of the current teacher using a sum of cosine similarity and smooth- feature-matching objectives:
The overall loss is averaged across the batch, and the aggregation objective is defined as
with varying on a per-batch basis as determined by RATS.
4. Structural and Hyperparameter Details
The implementation of RATS, as reported, employs patch size , patches per group with scale ratios (relative to ), and a count-bin truncation (bins for cells). The ranking margin is set to zero, batch size is $128$, and AdamW optimizer is used with learning rate annealing from to by cosine schedule.
5. Empirical Performance and Ablation Findings
Ablation studies on Ki67-WSI benchmarks clearly indicate that RATS outperforms baseline approaches. Equal-weighted teacher loss averaging produces a weighted mean-squared error (WMSE) around $48.1$; teacher-dropping (selecting the teacher with lowest feature loss) achieves WMSE . RATS yields a substantially lower WMSE of , corresponding to a absolute improvement over teacher-dropping and over equal averaging. Tumor-proportion-score mean absolute error decreases from $4.3$ to $2.5$ under RATS supervision. These results underscore the advantage of sample-wise, ranking-driven teacher assignment for multi-model agglomeration in IHC cell counting.
6. Technical Significance and Implications
RATS demonstrates that a task-aware, unsupervised, and sample-wise criterion—specifically, preservation of expected ranking among overlapping spatial regions—enables more informative and context-sensible teacher selection than task-agnostic or similarity-based alternatives. Its integration with foundation models and semantic anchors suggests fertile ground for scalable, data-efficient vision-language distillation strategies in medical imaging and beyond (Huang et al., 16 Nov 2025). A plausible implication is that similar rank-based supervision could generalize to other structured regression or localization tasks where spatial hierarchies are semantically meaningful.