Papers
Topics
Authors
Recent
Search
2000 character limit reached

Rank-Aware Teacher Selecting (RATS)

Updated 23 November 2025
  • The paper shows that ranking-driven teacher selection reduces WMSE by up to 13% compared to alternative methods, greatly improving IHC cell counting accuracy.
  • RATS employs a novel global-to-local patch ranking mechanism by mapping image patches to semantic count anchors to enforce monotonicity in predictions.
  • The method integrates dynamic teacher assignment into the distillation loss, yielding superior quantitative performance and more reliable supervision for multi-model agglomeration.

Rank-Aware Teacher Selecting (RATS) is a sample-wise, unsupervised strategy for selecting teacher models during knowledge distillation, introduced to improve agglomeration of multiple foundation models for immunohistochemistry (IHC) image cell counting. RATS leverages global-to-local ranking consistency across image patches to assess individual teachers' counting competency, enabling adaptive selection of the most reliable teacher per batch in the distillation process. This approach directly addresses the heterogeneity and complexity inherent in multi-class cell counting tasks within IHC images by providing robust, task-aware supervision based on semantic consistency in predicted instance counts (Huang et al., 16 Nov 2025).

1. Global-to-Local Patch Ranking

The core innovation in RATS is the definition of a global-to-local ranking mechanism over patch groups sampled from each input image. Each image is decomposed into a group of kk center-aligned patches of progressively increasing size, with the expectation that larger patches contain at least as many cells as their contained subregions.

For each teacher model TiT_i, patch features yij=Ti(pj)y_{ij} = T_i(p_j) are mapped onto a fixed set of discrete semantic "count anchors" A={ab}b=0nA = \{a_b\}_{b=0}^n, generated, for example, using CLIP-like vision-LLMs with natural language prompts encoding cell-count information. The similarity between image-patch features and count anchors is scored and converted by a softmax into a soft distribution Pij(b)P_{ij}(b) over count bins:

Pij(b)=softmaxb=0n ⁣(τcos(yij,ab))P_{ij}(b) = \operatorname{softmax}_{b=0 \ldots n}\!\bigl(\tau \cdot \cos(y_{ij}, a_b)\bigr)

where τ\tau is a temperature parameter. The predicted patch count is the expectation:

C^ij=b=0nbPij(b)\hat{C}_{ij} = \sum_{b=0}^n b \cdot P_{ij}(b)

The ranking constraint is enforced using a margin-ranking hinge loss penalizing violations of the monotonicity C^i1C^i2C^ik\hat{C}_{i1} \leq \hat{C}_{i2} \leq \ldots \leq \hat{C}_{ik}, for each teacher ii and patch group:

Lranki(patch group)=1u<vkmax[0,(C^ivC^iu)+ϵ]L_{\text{rank}}^i(\text{patch group}) = \sum_{1 \leq u < v \leq k} \max \left[ 0, -(\hat{C}_{iv} - \hat{C}_{iu}) + \epsilon \right]

with margin ϵ=0\epsilon=0. This loss measures how well each teacher preserves the expected cell-count ordering from local (smaller) to global (larger) spatial context.

2. Sample-Wise Teacher Selection Procedure

A distinguishing aspect of RATS is its adaptation of teacher selection at the batch level. For every batch, each candidate teacher TiT_i receives the same GG patch groups derived from BB images. Each teacher's total ranking loss LrankiL_{\text{rank}}^i is computed over the batch, and the teacher with the lowest loss is selected as the supervisor for distillation in that iteration:

i=argmini=1..NLrankii^* = \arg\min_{i=1..N} L_{\text{rank}}^i

This mechanism ensures that exactly one teacher is selected per batch—specifically, the one whose prediction ordering most closely matches the ground-truth global-to-local cell-count ranking on that data subset.

3. Integration into the Distillation Loss

Following the dynamic teacher selection, the chosen teacher TiT_{i^*} serves as the exclusive supervision signal for the student model SS. Classical feature-based distillation is performed on full images, aligning student representations with those of the current teacher using a sum of cosine similarity and smooth-1\ell_1 feature-matching objectives:

Ldistill(x)=Lcos[S(x),Ti(x)]+Lsmooth-1[S(x),Ti(x)]L_{\text{distill}}(x) = L_{\cos}[S(x),T_{i^*}(x)] + L_{\text{smooth-} \ell_1}[S(x),T_{i^*}(x)]

The overall loss is averaged across the batch, and the aggregation objective is defined as

Lagglom=ExdataLdistill(x)L_{\text{agglom}} = \mathbb{E}_{x \sim \text{data}} L_{\text{distill}}(x)

with TiT_{i^*} varying on a per-batch basis as determined by RATS.

4. Structural and Hyperparameter Details

The implementation of RATS, as reported, employs patch size M=224M=224, k=4k=4 patches per group with scale ratios {5/8,3/4,7/8,1}\{5/8, 3/4, 7/8, 1\} (relative to MM), and a count-bin truncation n=4n=4 (bins for 0,1,2,3,40,1,2,3,\geq 4 cells). The ranking margin ϵ\epsilon is set to zero, batch size is $128$, and AdamW optimizer is used with learning rate annealing from 1e-31\text{e-}3 to 1e-61\text{e-}6 by cosine schedule.

5. Empirical Performance and Ablation Findings

Ablation studies on Ki67-WSI benchmarks clearly indicate that RATS outperforms baseline approaches. Equal-weighted teacher loss averaging produces a weighted mean-squared error (WMSE) around $48.1$; teacher-dropping (selecting the teacher with lowest feature loss) achieves WMSE 36.3\approx 36.3. RATS yields a substantially lower WMSE of 31.5\approx 31.5, corresponding to a 13%13\% absolute improvement over teacher-dropping and 35%35\% over equal averaging. Tumor-proportion-score mean absolute error decreases from $4.3$ to $2.5$ under RATS supervision. These results underscore the advantage of sample-wise, ranking-driven teacher assignment for multi-model agglomeration in IHC cell counting.

6. Technical Significance and Implications

RATS demonstrates that a task-aware, unsupervised, and sample-wise criterion—specifically, preservation of expected ranking among overlapping spatial regions—enables more informative and context-sensible teacher selection than task-agnostic or similarity-based alternatives. Its integration with foundation models and semantic anchors suggests fertile ground for scalable, data-efficient vision-language distillation strategies in medical imaging and beyond (Huang et al., 16 Nov 2025). A plausible implication is that similar rank-based supervision could generalize to other structured regression or localization tasks where spatial hierarchies are semantically meaningful.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Rank-Aware Teacher Selecting (RATS).