Papers
Topics
Authors
Recent
2000 character limit reached

EpiSeg: Epithelium Segmentation in IBD Pathology

Updated 17 December 2025
  • EpiSeg is a patch-level segmentation module designed for digital pathology in inflammatory bowel disease that uses logistic regression on H0-mini features to delineate epithelial compartments.
  • It integrates with the IMILIA framework to compute localized cellular density markers by mapping HistoPLUS cell centroids to segmented epithelium.
  • EpiSeg achieves near-perfect discrimination with an average precision of 0.98 and strong Pearson correlations with epithelial cell counts across multiple IBD datasets.

EpiSeg is a patch-level epithelium segmentation module designed for large-scale, automated analysis of hematoxylin and eosin (H&E) whole-slide images in the context of inflammatory bowel disease (IBD) digital pathology. Developed as a component of the IMILIA (Interpretable Multiple Instance Learning for Inflammation Analysis) framework, EpiSeg enables efficient, coarse-grained delineation of epithelial compartments, facilitating downstream computation of biologically relevant cell density markers for histopathological activity assessment (Baiocco-Rodrigues et al., 15 Dec 2025).

1. Functional Role within IMILIA

EpiSeg operates as a core element of the interpretability module in IMILIA, which is structured around two main stages: (a) a Multiple-Instance Learning (MIL) model for slide-level inflammation prediction (Chowder), and (b) interpretability sub-modules aimed at identifying morphological and cellular correlates of the predictions. EpiSeg segments epithelial regions at the patch level and its output is combined with cell centroids derived from HistoPLUS, a companion instance segmentation algorithm, to compute compartment-localized cellular densities. This integration is pivotal for describing tissue microdomains—such as immune cell infiltration within epithelium—that are indicative of disease activity in IBD. Operationally, Chowder first identifies tiles driving high or low inflammation predictions ("max" and "min" tiles), EpiSeg then segments epithelium within these tiles, and finally HistoPLUS cell predictions are mapped to epithelial compartments for localization-aware quantification.

2. Network Architecture

EpiSeg eschews conventional U-Net or encoder–decoder segmentation architectures in favor of a logistic regression model acting at the patch level atop a fixed, pre-trained H0-mini feature extractor. The workflow is as follows:

  • Each input tile (1022 × 1022 pixels, 0.5 µm/pixel) is partitioned into a grid of 14 × 14 pixel patches, yielding approximately 73 × 73 patches per tile.
  • Each patch is embedded into a 768-dimensional vector xpR768x_p \in \mathbb{R}^{768} via H0-mini.
  • For each patch, a linear logit is computed:

zp=wTxp+b,z_p = w^\mathrm{T} x_p + b,

where wR768w \in \mathbb{R}^{768} and bRb \in \mathbb{R} are the only trainable parameters.

  • The probability of epithelial presence per patch is given by the sigmoidal transformation:

pp=σ(zp)=11+ezp.p_p = \sigma(z_p) = \frac{1}{1 + e^{-z_p}}.

  • After training, a coarse segmentation mask (73 × 73) is produced by thresholding ppp_p at 0.5 or using the raw probabilities for further scoring.

No nonlinearity, skip connections, or decoder is used beyond this logistic regression, resulting in a highly streamlined architecture.

3. Loss Function and Training Objective

EpiSeg adopts a standard binary cross-entropy loss, regularized with an L2 penalty on the weights:

  • Each patch is labeled by the fraction of its area annotated as epithelium, yp[0,1]y_p \in [0,1].
  • The loss for NN patches:

LBCE=1Np=1N[yplogpp+(1yp)log(1pp)].L_\text{BCE} = -\frac{1}{N} \sum_{p=1}^N \left[y_p \log p_p + (1 - y_p) \log (1 - p_p)\right].

  • Overall objective:

Ltotal(w,b)=LBCE+λw22,L_\text{total}(w, b) = L_\text{BCE} + \lambda \|w\|_2^2,

with λ\lambda as the ridge regularization strength.

  • The regularization inverse C=1/λC = 1/\lambda is tuned by cross-validation.
  • No alternative losses (e.g., Dice, focal loss) or additional regularizers are used.

This minimalist approach leverages the representational power of the H0-mini embeddings, with all learning confined to the linear classifier.

4. Training Protocol

Training is conducted exclusively on the IBDColEpi dataset, which comprises 140 H&E-stained whole-slide images (WSI) with pixel-level epithelium annotations. After removing 8 small-tissue slides, 132 remain, with splits provided by the dataset authors. Preprocessing consists of:

  • Rescaling WSIs to 0.5 µm/pixel.
  • Extracting 1022 × 1022 pixel tiles centered on regions of interest.
  • Dividing each tile into 14 × 14 patches for H0-mini embedding extraction.

The feature extractor is kept frozen, and the single-layer logistic regression model is fit using a standard solver (e.g., liblinear or lbfgs) over all patch–label pairs with no mini-batching, data augmentation, dropout, or learning-rate schedule. The regularization parameter CC is tuned via 3-fold cross-validation. Convergence is typically achieved in fewer than 10 iterations, reflecting the tractability of the convex optimization for logistic regression.

5. Quantitative Evaluation

Performance on the held-out IBDColEpi test set is characterized by:

  • Average precision (area under the patch-level precision–recall curve): 0.98.
  • Full curve reported in the appendix (Figure \ref{fig:pr-curve}).
  • Orthogonal validation via Pearson correlation between epithelium area estimated by EpiSeg and epithelial cell counts from HistoPLUS:
    • SPARC IBD: r=0.85r = 0.85 (p<108p<10^{-8})
    • FINBB: r=0.74r = 0.74 (p<108p<10^{-8})
    • IBDColEpi: r=0.83r = 0.83 (p<108p<10^{-8})

No Dice or intersection-over-union (IoU) metrics are reported, as evaluation is conducted at patch—not pixel—resolution. The high average precision and strong cell count correlations suggest near-perfect patch-level discrimination.

6. Qualitative Characteristics and Error Modes

Analysis of representative tiles (Figure \ref{fig:episeg_gt_pred}) indicates that EpiSeg reliably identifies epithelial regions, including bands, crypt edges, and glandular structures. Notable failure modes:

  • Boundary artefacts: Errors occur if epithelium traverses patch corners and the dominant patch embedding derives from stroma, resulting in underprediction.
  • Small epithelial islands: Isolated glands occupying only a partial patch area may be missed, as patch-level labeling is based on proportional overlap.

These misclassifications have negligible impact on downstream cellular density calculations because most epithelial compartments span multiple patches substantially.

7. Downstream Integration and Interpretability in IMILIA

EpiSeg's 73 × 73 patch-level segmentation mask is upsampled and cropped to align with the original 224 × 224 Chowder tile region, producing a binary epithelial mask per tile. Cellular centroids predicted by HistoPLUS are spatially intersected with the epithelium mask to compute compartmentalized cell densities:

ρc=k=1Nc1(ckepithelium)(x,y)E(x,y)×mppx×mppy\rho^c = \frac{\sum_{k=1}^{N^c} \mathbf{1}(c_k \in \text{epithelium})}{\sum_{(x,y)} E(x, y) \times \text{mpp}_x \times \text{mpp}_y}

where ρc\rho^c is the density of cell type cc, NcN^c is the number of detected cells of type cc, E(x,y)E(x, y) is the binarized epithelium mask, and mppx\text{mpp}_x, mppy\text{mpp}_y are the microns-per-pixel scaling factors. This enables computation of biologically grounded tissue markers, such as "neutrophils per µm² of epithelium," instrumental for linking machine-learning predictions to pathophysiological processes in IBD. For example, neutrophil density is markedly higher in regions of high predicted inflammation ("max" tiles), consistent with established histopathological criteria. The seamless integration of EpiSeg within IMILIA enables the identification of interpretable, localized cellular phenomena that drive the MIL model's annotations (Baiocco-Rodrigues et al., 15 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to EpiSeg.