Histology-Informed Tiling (HIT)

Updated 20 November 2025

Histology-Informed Tiling (HIT) is a digital pathology framework that extracts biologically coherent, gland-centric patches from whole slide images for enhanced clinical predictions.
HIT employs a multi-stage pipeline including semantic segmentation, gland embedding, and attention-based multiple-instance learning to predict outcomes like cancer relapse and genetic alterations.
By reducing instance counts and aligning with pathologist annotations, HIT achieves up to a 10% AUC gain in CNV detection while improving computational efficiency and interpretability.

Histology-Informed Tiling (HIT) is a digital pathology framework designed to improve the predictive accuracy and interpretability of deep learning models operating on whole slide images (WSIs) by using biologically meaningful, gland-centric image patches. Unlike conventional grid-based tiling, which partitions tissue into arbitrary rectangular patches irrespective of underlying tissue architecture, HIT leverages semantic segmentation to extract entire glandular structures from WSIs—yielding tissue-conformant input instances for multiple-instance learning (MIL) and phenotyping. This approach was formalized and computationally validated in Bonnaffé et al., 2025, targeting applications such as predicting cancer relapse and the presence of genetic alterations in prostate cancer (Bonnaffé et al., 13 Nov 2025).

1. Pipeline Overview and Workflow

HIT operates as an end-to-end pipeline that begins with high-resolution, H&E-stained WSIs and culminates in gland-level phenotyping and interpretable slide-level predictions. The principal workflow comprises:

Input: H&E-stained WSIs at 40× magnification.
Semantic Segmentation (Stage 1): Each WSI is divided into overlapping 1 024×1 024 pixel patches. A U-Net+Transformer model ("GlandSeg") predicts pixel-wise class probabilities (stroma/background, epithelium, lumen), which are thresholded at 0.5. Epithelium and lumen classes are merged to define "whole glands".
Patch Extraction: Slide-level gland masks are reconstructed by stitching patch-level predictions; connected components yield individual gland-centric tiles ("HIT patches").
Feature Extraction (Stage 2): Each gland tile is resized, normalized, and embedded via a pre-trained backbone (ResNet-18, ResNet-50, or ViT) into a 512-dimensional feature vector. Optionally, gland embeddings may be refined via triplet contrastive loss:

$L_{\mathrm{triplet}} = \max\bigl(0,\, d(f(x_a),f(x_p)) - d(f(x_a),f(x_n)) + \alpha\bigr)$

with margin $\alpha=0.75$ and Euclidean metric $d(\cdot,\cdot)$ .

Unsupervised Clustering (Stage 3): Hierarchical agglomerative clustering (HAC) groups gland embeddings; cluster cut-level choice maximizes the between/within-cluster variance ratio.
Multiple-Instance Learning (Stage 4): For each WSI, up to 1 000 gland embeddings are aggregated via attention-based MIL (CLAM or PAW-MIL), predicting clinical targets (e.g., cancer relapse, copy number variation (CNV)).
Interpretability & Phenotyping (Stage 5): Gland-level contributions and attention weights are visualized on WSIs, supporting cluster-based phenotyping and alignment with pathologist annotations.

2. Semantic Segmentation Model and Performance

The segmentation network consists of a U-Net encoder-decoder with a Transformer bottleneck. The model inputs 1 024×1 024×3 RGB patches and produces softmax-normalized pixel-level outputs for stroma, epithelium, and lumen. Training is supervised with a pixel-wise cross-entropy loss, with heavy data augmentation (rotations, flips, color jitter) to enforce robustness.

Quantitative metrics demonstrate high segmentation accuracy:

Tissue Component	Dice Score (mean ± sd)
Stroma/background	0.91 ± 0.12
Whole gland	0.83 ± 0.17
Epithelium	0.68 ± 0.14
Lumen	0.71 ± 0.28

Whole-gland Dice of 0.83 ± 0.17 indicates high gland segmentation fidelity. Extracted glands are subsequently used as atomic units for downstream analysis.

3. Gland Embedding, Contrastive Learning, and Clustering

Gland tiles are embedded via either ResNet-18, ResNet-50, or ViT (DINO-pretrained) yielding 512-dimensional representations. Optional triplet contrastive learning allows fine-tuning to maximize intra-cluster similarity and inter-cluster separability, improving intra/inter variance ratio by approximately 30%.

Hierarchical agglomerative clustering (HAC) is applied with Euclidean distance and Ward linkage. The optimal cluster number $K=15$ is selected by maximizing the ratio of between- to within-cluster variance:

$R(K) = \frac{\mathrm{trace}(S_B)}{\mathrm{trace}(S_W)}$

Clusters correspond to distinct gland morphologies, ranging from background/low-grade glands to high-grade patterns. These clusters show robust correspondence to pathologist-annotated Gleason grades. For example, clusters 12 and 13 overlap with Gleason pattern ≥4+4 regions, while clusters 10, 14, and 15 map to 4+3 regions.

4. Multiple-Instance Learning and Aggregator Architectures

For MIL, each WSI is treated as a bag of up to 1 000 gland embeddings. Attention-based aggregator architectures used include CLAM and PAW-MIL:

CLAM: Computes attention as

$a_i = \frac{\exp\left(\mathbf w^T\tanh(\mathbf V h_i)\right)}{\sum_j \exp\left(\mathbf w^T\tanh(\mathbf V h_j)\right)}$

Aggregates $z = \sum_i a_i h_i$ , with prediction $p = \sigma(\mathbf w_p^T z + b)$ .

PAW-MIL: Decomposes each instance's effect into "importance" and "contribution":

$u_i = \mathbf w_a^T\phi(h_i),\quad v_i = \mathbf w_v^T\psi(h_i)$

with attention weights $\alpha_i = \mathrm{softmax}(u_i)$ and final prediction $\hat y = \sigma\left(\sum_i \alpha_i v_i\right)$ .

Binary cross-entropy is used for MIL training:

$L_{\mathrm{CE}} = -\sum_{c \in \{0,1\}} y_c \log \hat y_c$

Training utilizes grouped stratified five-fold cross-validation with three repetitions per fold and five held-out test splits. Optimal settings for the ResNet-18 + CLAM aggregator are batch size 8, learning rate $1\times10^{-3}$ , and 16 epochs.

5. Quantitative Performance and Efficiency

HIT demonstrates marked improvements in learning clinically relevant targets and computational efficiency. Over 760 WSIs, HIT produces approximately 380 000 gland instances versus 7.6 M grid tiles, a >20× reduction. Encoding time is likewise reduced from ≈76 000 s to ≈3 800 s, with a net 5× time gain accounting for segmentation overhead.

Performance metrics (mean AUC ± sd across 5 folds × 3 runs):

Task	Grid-tile (GT) + ResNet-18	HIT + ResNet-18	HIT advantage
BCR (ICGC-C)	0.70 ± 0.05	0.72 ± 0.04	(ViT: ≈0.75 ± 0.03 both)
EMT-CNV	0.65 ± 0.06	0.75 ± 0.05	+10% (p<0.01)
MYC-CNV	0.68 ± 0.05	0.78 ± 0.04	+10% (p<0.01)

All improvements in CNV detection are statistically significant at $\alpha=0.05$ .

6. Phenotyping, Interpretability, and Clinical Alignment

Downstream analyses reveal that HIT's biologically meaningful clustering aligns with genotype, phenotype, and pathologist annotation:

Gland Clusters and Genetic Alterations: Glands most associated with EMT-gain (clusters 2, 9, 11, 13) and BCR-positivity (clusters 6, 15) are phenotypically distinct (e.g., cluster 13 nuclei density 0.55 vs. overall 0.32).
Gleason Pattern Concordance: High-grade clusters (12, 13) correspond to regions annotated as Gleason 4+4+. Attention-weighted heatmaps highlight the top 10% most predictive glands, mirroring areas of highest clinical concern as verified by pathologists.
Interpretability: The PAW-MIL aggregator's dual decomposition enables gland-level importance visualization and facilitates human-machine concordance in histopathological review.

7. Contributions and Implications

Focusing on entire glands, HIT:

Respects anatomical boundaries, avoiding truncation of morphologically and clinically important units.
Produces lower instance counts, increasing instance-level signal-to-noise and computational tractability.
Supports granular comparison between algorithm-derived clusters and pathologist-determined Gleason patterns.
Provides gland-resolved attention maps, enhancing transparency and facilitating interpretability in downstream task predictions.

Observed improvements include a roughly 10% absolute AUC gain in CNV prediction tasks and a 95% reduction in data input size, with only marginal change in relapse prediction. These properties establish HIT as a scalable, interpretable alternative to grid-based tiling for computational pathology pipelines seeking to incorporate architectural tissue priors (Bonnaffé et al., 13 Nov 2025).

PDF Markdown Chat (Pro)

References (1)

Histology-informed tiling of whole tissue sections improves the interpretability and predictability of cancer relapse and genetic alterations (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Histology-Informed Tiling (HIT).

Histology-Informed Tiling (HIT)

1. Pipeline Overview and Workflow

2. Semantic Segmentation Model and Performance

3. Gland Embedding, Contrastive Learning, and Clustering

4. Multiple-Instance Learning and Aggregator Architectures

5. Quantitative Performance and Efficiency

6. Phenotyping, Interpretability, and Clinical Alignment

7. Contributions and Implications

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Histology-Informed Tiling (HIT)

1. Pipeline Overview and Workflow

2. Semantic Segmentation Model and Performance

3. Gland Embedding, Contrastive Learning, and Clustering

4. Multiple-Instance Learning and Aggregator Architectures

5. Quantitative Performance and Efficiency

6. Phenotyping, Interpretability, and Clinical Alignment

7. Contributions and Implications

Sponsor

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research