Papers
Topics
Authors
Recent
2000 character limit reached

Pathology Foundation Models

Updated 22 November 2025
  • Pathology foundation models are large neural networks, often based on Vision Transformers, trained on diverse histopathology images using self-supervised or weakly supervised paradigms.
  • They generate robust, stain-agnostic, and organ-agnostic embeddings that efficiently adapt to tasks such as cancer classification, tissue segmentation, and biomarker prediction.
  • These models utilize parameter-efficient fine-tuning and modular computational pipelines to address domain shifts and improve performance on various digital pathology benchmarks.

A pathology foundation model is a large-scale neural network—most commonly Vision Transformer (ViT)-based—trained on massive and heterogeneous corpora of histopathology images using predominantly self-supervised or weakly supervised paradigms. These models are explicitly designed to produce transferable feature representations that can be efficiently adapted, without significant retraining, to a wide range of downstream digital pathology tasks such as disease and cancer classification, tissue segmentation, biomarker prediction, immunohistochemical scoring, prognosis, and report generation (Ochi et al., 31 Jul 2024, Xiong et al., 5 Apr 2025, Campanella et al., 9 Jul 2024). The distinguishing features of pathology foundation models versus traditional task-specific approaches are their parameter scale (hundreds of millions to over a billion), breadth and heterogeneity of pretraining data, and their use of universal, annotation-efficient learning objectives that yield robust, stain-agnostic, and organ-agnostic embeddings.

1. Architectural Principles and Pretraining Protocols

Modern pathology foundation models ("PFMs"—Editor's term) are dominated by ViT-style architectures, with variants from ViT-Base (~86M parameters) up to ViT-Gigantic (~1.1B parameters). Architectures include pure ViTs (UNI, RudolfV, GigaPath, Atlas, PLUTO-4G), hybrid models combining CNNs and Transformers (CTransPath, PathOrchestra), and, in some instances, multimodal extensions that integrate text or molecular data (CONCH, THREADS, GPFM) (Dippel et al., 8 Jan 2024, Alber et al., 9 Jan 2025, Vaidya et al., 28 Jan 2025, Padigela et al., 4 Nov 2025).

Pretraining strategies are almost exclusively self-supervised and can include:

Typical pretraining datasets encompass hundreds of thousands to several million whole-slide images (WSIs), covering tens to hundreds of tissue types, multiple institutions, diverse staining protocols (H&E, IHC, special stains), and spanning a wide array of scanners and magnifications (Alber et al., 9 Jan 2025, Padigela et al., 4 Nov 2025).

Table: Prominent Foundation Models

Model Param Count Pretraining Data SSL/Loss Distinguishing Features
UNI 303M–1.5B 100K–200M patches DINOv2 + MIM ViT-L, large scale, pure vision
Virchow2 632M 1.5M WSIs DINOv2, contrastive Multicenter, mixed magnification
Atlas 632M 1.2M WSIs RudolfV/DINOv2 Multi-stain/magnification
PLUTO-4G/S 1.1B/22M 551K WSIs DINOv2, FlexiViT Multi-scale, 2D-RoPE
GPFM 307M 190M tiles Multi-expert distil Knowledge distillation, 34 sites
PathOrchestra 350M 300K WSIs DINOv2 + iBOT 112 tasks, report generation
ELF -- 53K WSIs Ensemble, MoCoV3 Fusion of 5 encoders (GigaPath, CONCH, UNI, Virchow2, H-Optimus0)

2. Computational Pipelines, Feature Extraction, and Adaptation

PFMs universally process gigapixel WSIs via high-magnification tiling (typ. 224–512px) and a multi-stage pipeline:

For unsupervised segmentation, factorization approaches such as F-SEG perform non-negative matrix factorization (NMF) or clustering (e.g., k-means, fixed NMF using cluster centers from global pooled features) on the patch feature maps, yielding semantic segmentation masks with no retraining (Gildenblat et al., 9 Sep 2024).

Adaptation and Probing Strategies

Empirically, PEFT achieves the highest accuracy for moderate (~100+) data regimes, while linear probing or KNN are optimal for few-shot tasks (<5 labels/class) (Lee et al., 21 Oct 2024).

3. Application Domains, Quantitative Performance, and Benchmarks

PFMs have demonstrated strong performance across an unusually broad spectrum of tasks:

4. Robustness, Security, and Limitations

Despite their success, PFMs face critical challenges:

  • Domain shift/generalization: Color, scanner, and protocol variability across institutions can degrade accuracy. Color normalization and multi-site pretraining only partially mitigate this (Ochi et al., 31 Jul 2024, Xiong et al., 5 Apr 2025).
  • Adversarial vulnerability: Even imperceptible perturbations to 0.1% of a WSI’s patches (ε=4/255, FGSM) can cause accuracy drops up to 20–50% (“local perturbation, global impact”; universal and transferable attacks UTAP) (Liu et al., 30 May 2025, Wang et al., 18 Oct 2025).
  • Compute and sustainability: PFMs are up to 35x more energy-intensive than parametric-matched task-specific networks in clinical deployment (6.74–22.09 Wh/biopsy for FMs vs. 0.63 Wh/biopsy for TS model) (Mulliqi et al., 28 Feb 2025, Tizhoosh, 27 Oct 2025).
  • Interpretability: Most PFMs remain black boxes; explainability advances lag adoption. Errors, including hallucinations in generative tasks, remain a risk (Ochi et al., 31 Jul 2024, Tizhoosh, 27 Oct 2025).
  • Patch-size sensitivity and biological context: Naïve patching (e.g., 224×224 px) poorly encodes meso- and macro-architectural cues, and transformer spatial encoding limits geometric robustness (rotation, scale, magnification) (Tizhoosh, 27 Oct 2025).
  • Continual adaptation: Emerging stains, scanner types, and morphologies require rapid model update/migration strategies; federated learning remains an unsolved problem at scale (Xiong et al., 5 Apr 2025).

5. Model Comparison and Notable Advances

Recent models highlight important trends:

Model Unique Innovations Performance Highlight
GPFM Multi-expert knowledge distillation Top-1 average rank on 39-task benchmark
Atlas ViT-H/14, robust self-distillation Leading molecular+morphological average
PathOrchestra Validation on 112 tasks, struct. reports 47 tasks with >0.95 acc/AUC
ELF Ensemble of 5 FMs, slide-level encoding Outperforms all base FMs, largest task range
Threads Paired image–molecular contrastive loss +6.3% AUC over best baseline, excels at rare event prediction
PLUTO-4G/S ViT-G/14, FlexiViT, 2D-RoPE State-of-the-art on segmentation, Dx
Virchow2 Mixed mag. training, 3.1M slides High slide-level robustness, cross-task performance

Ensembling (ELF), multi-expert distillation (GPFM), and scale/multimodal pretraining (Threads, CONCH, PLUTO-4G) all improve generalization and data efficiency. Parameter-efficient adaptation (LoRA/PEFT), and modular “slide-level” architectures are key for clinical settings (Luo et al., 22 Aug 2025, Lee et al., 21 Oct 2024).

6. Critical Perspectives and Future Directions

Recent critical analyses identify foundational misalignments:

  • Overgeneralization: "Myth of the universal model"—organ- and task-specific fine-tuning regularly outperforms zero-shot generalization; macro-F1 rarely exceeds 0.42 for broad pan-organ classification (Tizhoosh, 27 Oct 2025).
  • Architectural inertia: Blindly transferring non-medical ViT architectures, without pathologist-driven or multi-scale inductive biases, limits clinical translation (Tizhoosh, 27 Oct 2025, Dippel et al., 8 Jan 2024).
  • Fundamental data limitations: Even large pathology data lakes fall short of the scale available in vision/language; this slows scaling-law-driven improvements and hurts rare phenotype coverage (Tizhoosh, 27 Oct 2025, Xiong et al., 5 Apr 2025).
  • Robustness and security: Systematic adversarial evaluations (UTAP, butterfly effect) reveal FMs are not yet robust enough for unsupervised, high-consequence deployments (Liu et al., 30 May 2025, Wang et al., 18 Oct 2025).

Proposed responses include:

7. Clinical Impact and Translational Outlook

PFMs have established themselves as state-of-the-art feature extractors for nearly all computational pathology tasks, delivering performance gains on detection, segmentation, biomarker prediction, and clinical endpoint forecasting. Their adoption has enabled unsupervised and weakly supervised workflows, dramatically reducing annotation costs and accelerating research deployment. Nonetheless, for high-stakes clinical scenarios with abundant labeled data, well-optimized task-specific models often outperform FMs, emphasizing an integration strategy: exploit foundation models for rapid prototyping and data-scarce settings, transitioning to task-optimized architectures for mature deployment (Mulliqi et al., 28 Feb 2025).

Translational success demands ongoing validation on multi-institutional data, robust evaluation under adversarial and domain shift conditions, explicit explainability, and efficient mechanisms for adaptation as pathology knowledge and practice evolve (Ochi et al., 31 Jul 2024, Campanella et al., 9 Jul 2024, Xiong et al., 5 Apr 2025).


References:

(Ochi et al., 31 Jul 2024, Gildenblat et al., 9 Sep 2024, ai et al., 24 Mar 2024, Campanella et al., 9 Jul 2024, Alber et al., 9 Jan 2025, Vaidya et al., 28 Jan 2025, Xiong et al., 5 Apr 2025, Mulliqi et al., 28 Feb 2025, Lee et al., 21 Oct 2024, Padigela et al., 4 Nov 2025, Yan et al., 31 Mar 2025, Luo et al., 22 Aug 2025, Wang et al., 18 Oct 2025, Lv et al., 18 Jul 2025, Tizhoosh, 27 Oct 2025, Liu et al., 30 May 2025, Sun et al., 9 Jul 2025, Dippel et al., 8 Jan 2024, Sun et al., 9 Jul 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Pathology Foundation Models.