Papers
Topics
Authors
Recent
Search
2000 character limit reached

Coarse Guidance Network (CGN)

Updated 9 February 2026
  • Coarse Guidance Network (CGN) is a module that injects coarse spatial context into high-resolution patch features to enhance slide-level predictions in MIL frameworks.
  • It remaps instance features to a coarse grid using field-of-view driven binning and processes them through a lightweight convolutional head to compute a guidance map.
  • Empirical evaluations show that incorporating CGNs improves biomarker classification AUCs while maintaining low parameter and computational overhead.

A Coarse Guidance Network (CGN) is a module designed to learn and inject spatial contextual information at a coarser scale into high-magnification instance features within Multiple-Instance Learning (MIL) frameworks for whole-slide image (WSI) analysis. The CGN operates via grid-based remapping of instance features and a lightweight convolutional head to produce a coarse guidance map, which is then used to modulate the instance features before final attention-based aggregation. This approach enables progressive multi-scale context modeling in computational pathology tasks, offering a parameter-efficient mechanism for slide-level prediction enhancement while maintaining computational tractability (Wu et al., 2 Feb 2026).

1. Architectural Overview

The CGN processes high-magnification patch features H∈RN×DH \in \mathbb{R}^{N \times D} and their normalized spatial coordinates (xn′,yn′)∈[0,1]2(x'_n, y'_n)\in[0,1]^2. Its core workflow includes three sequential steps:

  1. Grid-based Remapping: High-magnification features are aggregated into a 3D coarse feature map M∈RD×H′×W′M \in \mathbb{R}^{D \times H' \times W'} based on spatial bin assignments determined by a selectable field-of-view (FOV) parameter.
  2. Convolutional Guidance Head: MM is processed by two 3×3 convolutions with ReLU activations and a 1×1 convolution with Sigmoid activation to yield the coarse guidance map P∈R1×H′×W′P \in \mathbb{R}^{1 \times H' \times W'}.
  3. Patch-level Gating: PP is flattened and indexed to obtain MA∈RNM_A \in \mathbb{R}^{N}, which gates each corresponding row in HH, resulting in modulated features Hk=H⊙MAH_k = H \odot M_A.

The diagrammatic ASCII representation is:

PP5

2. Grid-based Remapping

Instance features and coordinates are mapped to a coarse grid via field-of-view driven binning. For each instance nn with normalized coordinates (xn′,yn′)∈[0,1]2(x'_n, y'_n)\in[0,1]^20, the grid cell assignment (xn′,yn′)∈[0,1]2(x'_n, y'_n)\in[0,1]^21 is determined as: (xn′,yn′)∈[0,1]2(x'_n, y'_n)\in[0,1]^22 where (xn′,yn′)∈[0,1]2(x'_n, y'_n)\in[0,1]^23, (xn′,yn′)∈[0,1]2(x'_n, y'_n)\in[0,1]^24, with (xn′,yn′)∈[0,1]2(x'_n, y'_n)\in[0,1]^25 the selected FOV.

The feature map (xn′,yn′)∈[0,1]2(x'_n, y'_n)\in[0,1]^26 is computed by averaging all high-magnification vectors falling into each bin: (xn′,yn′)∈[0,1]2(x'_n, y'_n)\in[0,1]^27 where (xn′,yn′)∈[0,1]2(x'_n, y'_n)\in[0,1]^28 collects all instances assigned to grid cell (xn′,yn′)∈[0,1]2(x'_n, y'_n)\in[0,1]^29. In vectorized notation,

M∈RD×H′×W′M \in \mathbb{R}^{D \times H' \times W'}0

followed by reshaping M∈RD×H′×W′M \in \mathbb{R}^{D \times H' \times W'}1 to M∈RD×H′×W′M \in \mathbb{R}^{D \times H' \times W'}2.

3. Convolutional Guidance Computation

After remapping, M∈RD×H′×W′M \in \mathbb{R}^{D \times H' \times W'}3 is passed through three sequential convolutions: M∈RD×H′×W′M \in \mathbb{R}^{D \times H' \times W'}4

M∈RD×H′×W′M \in \mathbb{R}^{D \times H' \times W'}5

M∈RD×H′×W′M \in \mathbb{R}^{D \times H' \times W'}6

Here, M∈RD×H′×W′M \in \mathbb{R}^{D \times H' \times W'}7 is used as the hidden channel width for all CGN blocks. No self-attention or Transformer module is included; the head is purely convolutional.

M∈RD×H′×W′M \in \mathbb{R}^{D \times H' \times W'}8 is flattened to length M∈RD×H′×W′M \in \mathbb{R}^{D \times H' \times W'}9, and each instance MM0 gathers its coarse guidance value MM1 according to its assigned index. The final gated features are MM2.

4. Integration with Attention-based MIL

In standard attention-based MIL settings, instance embeddings MM3 propagate through an attention aggregator MM4 to yield slide-level predictions: MM5 Installing a CGN at scale MM6 updates MM7 as: MM8

Stacking multiple CGNs (for example, at FOVs MM9) results in a progressive series of residual updates: P∈R1×H′×W′P \in \mathbb{R}^{1 \times H' \times W'}0 The final P∈R1×H′×W′P \in \mathbb{R}^{1 \times H' \times W'}1 is then input to attention modules such as ABMIL, DSMIL, CLAM-SB, or CLAM-MB, which conduct the slide-level aggregation.

5. Training Protocol and Hyperparameters

Training details for CGN-based models are as follows:

  • Losses: Biomarker tasks (ER, PR, HER2 status) use cross-entropy loss. Prognosis tasks (CRC Surv) use a negative log-likelihood loss (NLLSurvLoss) that combines censored and uncensored terms:

P∈R1×H′×W′P \in \mathbb{R}^{1 \times H' \times W'}2

P∈R1×H′×W′P \in \mathbb{R}^{1 \times H' \times W'}3

P∈R1×H′×W′P \in \mathbb{R}^{1 \times H' \times W'}4

  • Optimizer: AdamW, learning rate P∈R1×H′×W′P \in \mathbb{R}^{1 \times H' \times W'}5, cosine-decay scheduler.
  • Early stopping: patience = 10.
  • Epochs: maximum 150.
  • FOV choices: At 20×, e.g., P∈R1×H′×W′P \in \mathbb{R}^{1 \times H' \times W'}6 pixels (providing 3 CGNs).
  • Hidden channels: P∈R1×H′×W′P \in \mathbb{R}^{1 \times H' \times W'}7 per CGN.
  • Parameter and compute cost: Each CGN adds approximately P∈R1×H′×W′P \in \mathbb{R}^{1 \times H' \times W'}8M parameters per scale.

6. Empirical Performance and Ablation Results

Empirical studies isolating the CGN demonstrate a consistent benefit on multiple biomarker classification tasks. For instance, a single CGN (FOV=1536) added to ABMIL (using CONCH features) produces:

System ER AUC (%) PR AUC (%) HER2 AUC (%) Params (M) FLOPs (G)
ABMIL w/o CGN (single-scale 20×) 87.22 84.14 80.06 — —
ABMIL + single CGN (FOV=1536) 88.92 84.76 80.84 ~1.51 ~17.7
ABMIL + three CGNs ([1536,2048,3072]) 89.76 85.24 82.86 ~2.18 ~17.7
ABMIL + five CGNs ([1024,1536,2048,2560,3072]) 91.42 84.18 84.62 — —

Adding at least one CGN leads to a clear increase in slide-level AUC—e.g., gains of +1.70pp (ER), +0.62pp (PR), and +0.78pp (HER2) for a single scale. Stacking multiple CGNs for progressive multi-scale guidance further improves performance (e.g., +4.20pp for ER, +4.56pp for HER2). CGNs achieve these gains at reduced parameter and compute cost relative to methods such as concatenation or cross-scale attention schemes (P∈R1×H′×W′P \in \mathbb{R}^{1 \times H' \times W'}9M parameters/PP0G FLOPs for CGN vs. PP1M/PP2G for cross-scale alternatives), while delivering larger accuracy improvements (+3.6pp ER, +4.05pp HER2).

7. Summary of Properties

A CGN remaps high-magnification features to a spatially coarse grid, applies a three-layer convolutional head to compute a coarse attention map, reprojects this map back to the patch level to gate the D-dimensional features, and is trained end-to-end via the same MIL objectives. Each CGN block is lightweight (requiring PP3 hidden channels, PP4M parameters per scale), incurs minimal additional computation, and has been shown to consistently improve slide-level prediction performance in clinical biomarker and prognosis tasks (Wu et al., 2 Feb 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Coarse Guidance Network (CGN).