CellForge: Autonomous Virtual Cell Modeling

Updated 5 August 2025

CellForge is an autonomous system that builds computational models of virtual cells from raw multi-omics data and high-level task descriptions.
It employs a modular, multi-agent framework where specialized agents collaboratively design, assess, and optimize model architectures.
Empirical evaluations show up to a 40% reduction in prediction error and improved correlation metrics compared to traditional methods.

CellForge refers to an autonomous, agentic system that constructs and deploys computational models of virtual cells directly from raw single-cell multi-omics data and high-level task descriptions. Its foundational aim is to enable quantitative prediction of cellular responses—such as changes in gene expression profiles—following diverse perturbations (e.g., CRISPR gene knockouts, drug treatments), by transforming complex biological inputs into an optimized model architecture and executable code for training and inference. CellForge operationalizes cross-disciplinary expertise within a modular, multi-agent framework, unifying processes such as task analysis, model design, and experiment execution through collaborative agentic reasoning. The codebase is publicly accessible at https://github.com/gersteinlab/CellForge (Tang et al., 4 Aug 2025).

1. Conceptual Architecture and Agentic Paradigm

CellForge is architected as a multi-agent system in which specialized agents emulate the roles of domain experts in computational biology, machine learning, and systems modeling. The agent ensemble includes roles focused on distinct steps such as data parsing, methodological assessment, architectural design, and model training. Agents engage in an iterative, graph-based discussion moderated by a central critic agent, where each agent proposes candidate solutions, critiques others, and updates a weighted confidence in candidate strategies. The consensus mechanism integrates these streams through formulas, e.g.,

$c_t^{(i)} = 0.3\, c_{t-1}^{(i)} + 0.4\, \text{CriticAgentScore}(m_t^{(i)}, S) + 0.3\,\frac{1}{k-1}\sum_{j\neq i} \text{PeerScore}(m_t^{(i)}, E^{(j)}),$

where $c_t^{(i)}$ denotes the confidence in the $i$ th agent's proposal at round $t$ . This iterative procedure ensures that final modeling solutions incorporate multiple disciplinary perspectives and are robustly tailored to the presented perturbation and data context.

2. System Modules: Task Analysis, Method Design, and Experiment Execution

CellForge is structured around three core interlinked modules:

2.1 Task Analysis

A Data Parser agent extracts key dataset metadata (perturbation types, cell population labels, assayed features). Parallel agents—including the Dataset Analyst, Problem Investigator, and Baseline Assessor—scrutinize dataset properties, typical quality issues (such as batch effects or data sparsity), and recommend preprocessing steps. Literature and methodological context are retrieved from a static annotated corpus as well as dynamically via PubMed and GitHub APIs, using Sentence-BERT embeddings for semantic similarity. The end product is a structured summary (JSON) and a concise, task-adapted analysis report.

2.2 Method Design

During method design, a panel of expert agents propose and critique model architectures via message passing on a discussion graph—incorporating, for example, variational autoencoders for latent structure, deep transformers for long-range feature integration, and graph neural networks (GNNs) for leveraging gene regulatory or pathway knowledge, when appropriate. The design is refined through cycles of agentic consensus, culminating in a finalized research plan that specifies model components, training objectives, and evaluation protocols.

2.3 Experiment Execution

The consensus research plan is programmatically translated into full, production-grade Python code for data processing, model construction, optimization, validation, and result generation. The code generation and execution pipeline incorporates error recovery and self-debugging. Training routines use adaptive optimizers (such as AdamW), learning rate scheduling (OneCycle), and periodic metric evaluation. Standard quantitative metrics include Mean Squared Error (MSE), Pearson’s Correlation Coefficient (PCC), and coefficient of determination ( $R^2$ ), supplemented by domain-specific benchmarks for differentially expressed genes.

3. Technical Objective and Modeling Formalism

Mathematically, CellForge learns a mapping

$f_\theta: \mathbb{R}^d \times \mathcal{P} \rightarrow \mathbb{R}^{d'}$

where $x \in \mathbb{R}^d$ represents the pre-perturbation single-cell profile, $p \in \mathcal{P}$ is a structured perturbation label (encoding, for example, knockout identity or drug identity/dose), and $y \in \mathbb{R}^{d'}$ is the desired post-perturbation output profile. The encoder $g_\phi$ may be used to embed high-dimensional $x$ . Model evaluation relies on metrics:

$\,\mathrm{MSE} = \dfrac{1}{n d'} \sum_{i=1}^n \| y_i - \hat{y}_i \|^2$
$\,\mathrm{PCC} = \dfrac{\sum_{i=1}^n \langle y_i - \bar{y}, \hat{y}_i - \bar{\hat{y}} \rangle}{\sqrt{\sum \| y_i - \bar{y} \|^2} \sqrt{\sum \| \hat{y}_i - \bar{\hat{y}} \|^2}}$
$\,R^2 = 1 - \dfrac{\sum \| y_i - \hat{y}_i \|^2}{\sum \| y_i - \bar{y} \|^2}$ ,

where $n$ is the number of cells, $d'$ the feature dimension, and bars denote sample means.

This formalism is task-agnostic, enabling model architectures to be adapted by the agents in response to varied data modalities (e.g., scRNA-seq, CITE-seq, scATAC-seq) and perturbation representations.

4. Empirical Evaluation and Dataset Coverage

CellForge was validated across six diverse single-cell datasets representing a variety of perturbational contexts: gene knockouts (Adamson et al., Norman et al.), drug perturbations (Srivatsan et al.), cytokine stimulations (Schiebinger et al.), and multimodal single-cell modalities (CITE-seq, scATAC-seq). In these benchmarks, CellForge delivered:

Up to 40% reduction in prediction error (MSE) over task-specific state-of-the-art methods,
Approximately 20% higher correlation-based metrics (PCC, $R^2$ ),
Superior performance across diverse perturbation types and modalities.

The documented advantage is attributed to the agentic system’s ability to internalize biological priors, select appropriate inductive biases, and custom-tailor model design and preprocessing for each input dataset.

5. Integration with the Computational Biology Ecosystem

CellForge’s code generation outputs production-quality Python, with complete coverage of the analysis pipeline from raw data to trained virtual cell models. Automation extends to code execution and iterative debugging. The system interfaces with common computational biology data standards, supports typical optimization libraries, and produces outputs compatible with further analysis or downstream modeling. Dynamic agentic retrieval ensures that solutions are informed by both up-to-date and canonical methodological literature, enabling rapid adaptation to new biological tasks, evolving experimental designs, or novel data modalities.

6. Comparative Perspective and Impact

Relative to manual, expert-driven workflows, CellForge fundamentally reduces the labor and expertise requirements for virtual cell modeling. Its multi-agent paradigm enables reasoning analogous to multidisciplinary team-based research, but at machine timescales and with reproducibility guarantees. Whereas previous approaches might optimize architectures manually or by exhaustively searching hyperparameters, CellForge instead leverages structured dialogue, literature grounding, and consensus-driven solution generation, integrating both statistical machine learning and domain-specific biological knowledge.

CellForge’s contributions to the field include:

Enabling end-to-end, automated model design and implementation for single-cell perturbation response prediction,
Delivering substantial empirical gains on multiple biological benchmarks,
Providing a publicly available, extensible codebase,
Demonstrating that agentic consensus systems can outperform direct, “single-shot” LLM-based modeling or static pipelines for complex scientific modeling tasks (Tang et al., 4 Aug 2025).

7. Broader Implications and Future Directions

CellForge exemplifies a paradigm shift in computational systems biology, where collaborative AI agents decompose, debate, and solve complex, interdisciplinary modeling challenges. A plausible implication is acceleration in the pace of model prototyping and hypothesis testing in experimental single-cell research, especially for labs lacking computational expertise. The agentic design frameworks underlying CellForge may inspire future systems in other data-intensive life science domains (e.g., multimodal integration, virtual tissue modeling).

As the system’s modularity allows incorporation of new agents (for example, with expertise in causal inference or mechanistic modeling), and as publicly available code ensures community-driven extension, CellForge provides a foundation for increasingly autonomous, domain-adaptive scientific discovery across computational biology.

PDF Markdown Chat (Pro)

References (1)

CellForge: Agentic Design of Virtual Cell Models (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to CellForge.