Virtual Cell Foundation Models Overview
- Virtual cell foundation models are data-driven AI systems that integrate multimodal biological data to simulate cellular states, functions, and responses.
- They employ advanced deep learning architectures like vision transformers, multi-omics transformers, and agent-based frameworks for robust, generalizable analysis.
- These models enable high-throughput cell annotation, precise cancer prediction, and virtual immunohistochemistry, advancing mechanistic discovery and precision intervention.
A virtual cell foundation model is a data-driven, generalizable AI system designed to represent, simulate, and predict cellular states, functions, and phenotypes across biological modalities, scales, and perturbations. Unifying developments in deep learning, single-cell and spatial omics, and high-throughput experimentation, these models aim to provide executable, robust, and interpretable in silico representations of the cell—enabling downstream tasks from annotation to mechanistic discovery and precision intervention. The field now encompasses vision foundation models for cytopathology and spatial analysis, sequence and multi-omics transformers for single-cell genomics, agent-based and self-constructing architectures, and multimodal frameworks integrating imaging, transcriptional, and chromatin data.
1. Conceptual Overview and Core Paradigm
Virtual cell foundation models are AI systems constructed to learn universal representations for cellular entities and simulate the impact of perturbations (genetic, chemical, or environmental) on molecular, morphological, and phenotypic states. These models are distinguished by their:
- Domain-general transferability: Ability to generalize across tissues, conditions, and experimental modalities.
- Multimodal/multiscale integration: Joint learning from single-cell transcriptomics, chromatin accessibility (scATAC-seq), pathology images, and spatial omics, each mapped into a shared representation space.
- Executable, decision-relevant state-space: Latent encodings serve as substrates for simulating interventions, analyzing causal relationships, and supporting downstream tasks such as annotation, cell typing, and response prediction.
Virtual cell models differ from task-specific or dataset-specific models in architectural scope and pretraining scale. They leverage self-supervised and contrastive objectives, transformer or hybrid backbone architectures, and, increasingly, agentic systems capable of automated model generation and analysis.
2. Methodological Foundations and Model Architectures
Virtual cell foundation models are instantiated with a variety of deep learning methodologies, each tuned for domain, data type, and scientific objective.
Image-Based Virtual Cell Foundation Models
- Cytology and Histopathology: CytoFM (Ivezić et al., 18 Apr 2025) demonstrates the power of domain-specific foundation modeling in cytology. Using a ViT-Base backbone with iBOT self-supervision (masked image modeling and self-distillation), CytoFM is pretrained on a diverse cytology dataset, then deployed as a frozen feature extractor and evaluated in attention-based multiple instance learning frameworks for cancer prediction and cell type classification. The core objective is:
- Cell Segmentation: Foundation models such as CellSAM (Israel et al., 2023) extend vision transformer architectures (e.g., SAM) using prompt-based mask generation, integrating object detection (CellFinder, Anchor DETR) with mask decoders. CellVTA (Yang et al., 1 Apr 2025) addresses ViT spatial resolution loss in cell instance segmentation by processing high-resolution features via a CNN-based adapter and cross-attention mechanism, preserving ViT transferability and boosting segmentation accuracy on multi-organ datasets. CellViT++ (Hörst et al., 9 Jan 2025) employs foundation model encoders (e.g., Virchow/SAM/UNI) and token-based classification for energy-efficient, broad-spectrum cell segmentation/classification.
Sequence and Multi-Omics Foundation Models
- Single-Cell Transcriptomics: Models such as TEDDY (Chevalier et al., 5 Mar 2025) and DeepSeq (Dajani et al., 14 Jun 2025) exemplify scaling in both data (over 100M cells) and parameters (up to 400M). TEDDY leverages joint self-supervised and supervised objectives over gene sequences and biological annotation supervision, achieving improved generalization for disease classification over held-out donors and diseases. DeepSeq introduces LLMs augmented with real-time web search (e.g., GPT-4o) as agentic annotators for scRNA-seq, surpassing manual curation accuracy by using gene markers, ontology lookup, and prompt-driven workflows.
- Chromatin Accessibility: ChromFound (Jiao et al., 19 May 2025) presents a universal foundation model for scATAC-seq using genome-aware tokenization (incorporating chromosome, start/end position, accessibility) and a hybrid global-local encoder (Mamba state-space for long-range, windowed attention for local), trained on 1.97M cells/tissues and demonstrating zero-shot clustering, cross-omics prediction (ATAC→RNA), and enhancer-gene mapping.
Multimodal and Cross-Scale Models
- Histopathology + Transcriptomics: The PAST model (Yang et al., 8 Jul 2025) uses dual encoders (ViT-large for images, Transformer+GAT for transcriptomics+spatial) and CLIP-style contrastive pretraining to learn a unified cell representation. The architecture enables accurate gene expression prediction from images, virtual immunohistochemistry, and survival analysis.
- Unified Operator Grammars: The AIVC framework (Hu et al., 14 Oct 2025) formalizes model training and evaluation via a cell-state latent (CSL) built from modular operators—measurement, lift/project (cross-scale mapping), and intervention (dosing/scheduling)—and mathematical objectives that integrate within-scale, cross-modal, and perturbation-fitting losses.
Agentic and Self-constructing Virtual Cell Models
- Agent-Based Model Synthesis: CellForge (Tang et al., 4 Aug 2025) is an agentic system wherein multiple LLM-based agents (specialists in data analysis, modeling, training, validation) iteratively analyze data, survey literature, propose model plans, and generate code for virtual cell modeling. This architecture supports generalization across omics modalities and tasks, yielding performance gains (up to 40% MSE reduction vs. SOTA in perturbation tasks) and complete reproducibility of model design/evidence chains.
3. Benchmarking, Generalization, and Evaluation
Robust benchmarking is a cornerstone of virtual cell foundation model development. Evaluation frameworks emphasize:
- Generalization across contexts: Validating on held-out donors, tissues, or perturbations (e.g., TEDDY's held-out donors/diseases (Chevalier et al., 5 Mar 2025), CellFlux's OOD perturbation prediction (Zhang et al., 13 Feb 2025)), and transportability across sites/platforms (Hu et al., 14 Oct 2025, Bhardwaj et al., 22 Sep 2025).
- Functional & clinically relevant metrics: Area under the ROC curve (AUROC), panoptic quality (PQ), multi-class segmentation/classification (mPQ), gene expression Pearson correlation, and survival concordance index.
- Operator-aware data design: Partitioning to prevent data leakage (e.g., by donor/site), incorporating stress-testing on unseen distributions, and reporting calibration and bias as recommended in best practices (Hu et al., 14 Oct 2025).
- Domain/failure analysis: Evaluation pipelines frequently include pathologist-in-the-loop rating (qualitative assessment of segmentation outcomes (Guo et al., 9 Aug 2024, Wang et al., 1 Oct 2025)), curated hard-case sets, and ensemble/fusion approaches that combine models to resolve challenging cases.
4. Practical Applications and Scientific Impact
Virtual cell foundation models already display wide applicability, including:
- High-throughput annotation and automation: DeepSeq's agentic AI system raises annotation throughput on scRNA-seq from manual to automated at 82.5% accuracy (Dajani et al., 14 Jun 2025).
- Robust cell segmentation and spatial analytics: CellViT++ shows energy-efficient, adaptable segmentation/classification of diverse cell types with near-zero-shot capability, reducing manual annotation effort and cost (Hörst et al., 9 Jan 2025).
- Perturbation and phenotype prediction: CellFlux's flow matching approach provides continuous, invertible mappings between control and perturbed cell image distributions, distinguishing true biological effects from artifacts and enabling simulation of morphodynamic trajectories (Zhang et al., 13 Feb 2025).
- Multimodal digital pathology: PAST enables virtual molecular staining (virtual IHC), cell-resolved gene inference, and enhances survival stratification from standard slides, facilitating precision oncology workflows (Yang et al., 8 Jul 2025).
- Agentic discovery and model construction: CellForge's multi-agent system not only builds custom, task-specific models for single-cell perturbation prediction but also provides transparent audit trails for reproducibility and scientific scrutiny (Tang et al., 4 Aug 2025).
- Drug discovery and lab-in-the-loop cycles: The predict-explain-discover (P-E-D) paradigm anchors virtual cell models as actionable simulators that prioritize and suggest laboratory experiments to iteratively refine biological predictions (Noutahi et al., 20 May 2025).
5. Current Limitations and Open Challenges
Despite significant advancements, several challenges remain:
- Domain specificity and generalization: Many foundation models, particularly those pretrained on domain-general datasets, underperform on highly specialized or morphologically complex scenarios without further finetuning (e.g., limitations in kidney pathology segmentation (Guo et al., 9 Aug 2024, Wang et al., 1 Oct 2025); reverse expected gaps where general-purpose CNNs outperform ViTs (Vadori et al., 4 Feb 2025)).
- Batch effects and experimental artifact disentanglement: Robust modeling requires explicit conditioning on batch or control distributions to avoid confounding anthropogenic and biological sources of variation (as formulated in CellFlux (Zhang et al., 13 Feb 2025)).
- Interpretability and explanation: While foundational models achieve strong prediction, mechanistic interpretability (e.g., attribution to explicit molecular or pathway events) remains inadequate. The integration of causal or network structure—e.g., via gene network-guided attention (Cheng et al., 10 Nov 2024)—offers a partial remedy, but interpretability is not uniformly solved.
- Data quality and standardization: Heterogeneous, noisy, or incompletely annotated datasets, batch effects, and lack of unified feature schemas (e.g., for scATAC-seq) continue to complicate robust representation learning and benchmarking (Jiao et al., 19 May 2025, Bhardwaj et al., 22 Sep 2025).
- Computational and energy constraints: High-parameter and data-hungry architectures challenge scalability. Innovations such as CellViT++'s efficient classifier adaptation and CellForge's lightweight model synthesis mitigate, but do not eliminate, resource demands.
6. Future Directions and Grand Challenges
Key trajectories and open priorities for the field include:
- Unified, multimodal, and multiscale modeling frameworks: Integrating molecular, cellular, and tissue-level representations, with explicit operator grammars for measurement, cross-scale projection, and intervention (Hu et al., 14 Oct 2025, Bunne et al., 18 Sep 2024).
- Lab-in-the-loop scientific discovery: Realizing AI-driven, closed-loop experimentation, where virtual cell models suggest, prioritize, and interpret experiments, iteratively updating models and maximizing knowledge yield (Noutahi et al., 20 May 2025).
- Self-improving, living digital twins: Moving toward AI-powered digital twins that continually update their representations as new multimodal or clinical data become available (Bhardwaj et al., 22 Sep 2025).
- Transparent, FAIR-compliant, and community-driven benchmarking: Adoption of operator-aware benchmarks, rigorous leakage-resistant partitioning, and open challenge leaderboards for cross-institutional and cross-domain comparison (Bhardwaj et al., 22 Sep 2025, Hu et al., 14 Oct 2025).
- Ethical, diverse, and collaborative development: Emphasizing diversity in data, responsible and equitable model use, privacy, and open infrastructure (Bunne et al., 18 Sep 2024).
7. Summary Table of Representative Virtual Cell Foundation Models
| Model/System | Domain/Modality | Key Innovation | Benchmark/Performance |
|---|---|---|---|
| CytoFM (Ivezić et al., 18 Apr 2025) | Cytopathology imaging | Self-supervised ViT+iBOT, cytology-specific | Outperforms UNI/ImageNet FM in cell type classification |
| CellSAM (Israel et al., 2023) | Universal cell segmentation | Prompt-driven SAM with CellFinder DETR | SOTA F1/mAP across 9 platforms, strong zero-shot ability |
| CellViT++ (Hörst et al., 9 Jan 2025) | Multi-organ pathology segmentation | Token-based ViT, energy-efficient classifier | 81.6% "Good" on diverse patches; fast adaptation |
| TEDDY (Chevalier et al., 5 Mar 2025) | Single-cell transcriptomics | 400M-param Transformer; 116M cells, annotation-driven pretraining | Highest F1 in held-out donor/disease tasks |
| ChromFound (Jiao et al., 19 May 2025) | scATAC-seq (chromatin accessibility) | Hybrid Mamba-attention, genome-aware tokenization | Best zero-shot, clustering, enhancer-gene mapping |
| CellForge (Tang et al., 4 Aug 2025) | Agentic, cross-omics modeling | Multi-agent LLMs synthesize and code models | Outperforms SOTA on 6 perturbation datasets |
| PAST (Yang et al., 8 Jul 2025) | H&E images + transcriptomics | CLIP-style dual encoder; spatial + gene context | Best gene prediction, virtual IHC, survival metrics |
References
All claims, performance results, and technical frameworks are taken directly and exclusively from the following papers: (Ivezić et al., 18 Apr 2025, Israel et al., 2023, Zhang et al., 13 Feb 2025, Guo et al., 9 Aug 2024, Yang et al., 1 Apr 2025, Dajani et al., 14 Jun 2025, Yang et al., 8 Jul 2025, Vadori et al., 4 Feb 2025, Tang et al., 4 Aug 2025, Jiao et al., 19 May 2025, Bunne et al., 18 Sep 2024, Cheng et al., 10 Nov 2024, Noutahi et al., 20 May 2025, Chevalier et al., 5 Mar 2025, Hörst et al., 9 Jan 2025, Wang et al., 1 Oct 2025, Hu et al., 14 Oct 2025, Li et al., 9 Oct 2025, Bhardwaj et al., 22 Sep 2025).
Virtual cell foundation models are thus emerging as the central computational infrastructure for data-driven, mechanistic, and scalable cellular biology. The synthesis of methodological innovation, domain integration, open-benchmarks, and automated AI agentics delineates a path toward predictive, interpretable, and actionable in silico biology at scale.