Agentic Virtual Cell Models in Computational Biology

Updated 25 March 2026

Agentic virtual cell models are autonomous systems that integrate LLMs, dynamic memory, and reward-based planning to execute complex biological tasks.
They combine planner modules, specialized executors, and retrieval mechanisms to perform tasks such as perturbation prediction and gene regulation inference.
Implementations like CellForge and scAgents demonstrate superior performance in single-cell analysis and multi-omics integration using agentic methodologies.

Agentic virtual cell models are computational systems in which LLMs or other artificial intelligence components transcend passive prediction to autonomously plan, execute, and optimize multi-step workflows in computational biology. These models actively organize end-to-end biological tasks—such as literature retrieval, data processing, model development, and hypothesis generation—by orchestrating tool calls, integrating multimodal knowledge, iteratively updating internal representations, and optimizing actions according to reward-driven objectives. They constitute a paradigm shift, casting the virtual cell as an active scientific "agent" rather than a static simulator or generative oracle, and have become central to modern approaches to cellular modeling, single-cell analysis, and multi-omics integration (Li et al., 9 Oct 2025, Hu et al., 14 Oct 2025, Tang et al., 4 Aug 2025, Dip et al., 9 Oct 2025, Wei et al., 29 Nov 2025).

1. Conceptual Foundations and Definitions

Agentic virtual cell models embed agency into computational cellular biology by enabling AI systems to plan, act, and reason across complex scientific workflows. Such systems maintain an explicit internal state (e.g., experimental objectives, processed data, intermediate results), select from a defined action space (tool invocations, literature searches, experimental design, or model selection), observe the outcomes of these actions, update memory and context, and repeat this loop until a task-specific scientific goal is reached (Li et al., 9 Oct 2025). Agency is instantiated through properties including persistent context, dynamic memory, adaptive tool use, and reward-based learning.

The agentic paradigm is rooted in formal frameworks where policies are optimized according to expected reward within a Markov Decision Process (MDP):

State space ( $\mathcal{S}$ ): Configurations encoding task context, prior actions, retrieved data, and memory.
Action space ( $\mathcal{A}$ ): Available operations including data queries, tool execution, or subagent communication.
Transition function ( $P(s_{t+1}\mid s_t,a_t)$ ): Encapsulates changes to state other actions and tool outputs.
Reward function ( $R(s_t,a_t)$ ): Quantifies progress towards the task objective, using scientific metrics, correctness, completeness bonuses, or other signals.
Policy ( $\pi_\theta(a\mid s)$ ): Learned mechanism that selects optimal actions given state (Li et al., 9 Oct 2025).

Agentic models may be organized as single agents employing role-switching via system prompts, or as multi-agent communities where subagents (e.g., Hypothesis Generator, Data Curator, Model Builder) coordinate using message passing or shared blackboard architectures (Tang et al., 4 Aug 2025, Li et al., 9 Oct 2025).

2. Architectures and Algorithmic Design

Agentic virtual cell frameworks are typically composed of three architectural layers:

Planner/Policy Module
- Often based on a GPT-style or decoder-only LLM, this component encodes the current state and selects actions via learned affordance scores or chain-of-thought (CoT) prompting.
- Actions are scored as $\pi_\theta(a\mid s_t) = \mathrm{softmax}(g_\theta(h_t)[a])$ , with $h_t$ derived from the LLM's hidden layers.
- Tool embeddings and memory representations are incorporated for context-aware decision making (Li et al., 9 Oct 2025, Tang et al., 4 Aug 2025).
Tool/Executor Modules
- Provide domain-specialized routines for key cellular biology tasks (e.g., clustering, perturbation modeling, CRISPR guide design, gene network inference).
- Accessed through APIs, tools receive structured input (typically JSON) and yield observations $o_{t+1}$ for planner ingestion.
- Tool use is often mediated by a library or registry maintained within the agentic system (Tang et al., 4 Aug 2025).
Memory and Retrieval Modules
- Store histories of interactions, retrieved textual evidence (often via Retrieval-Augmented Generation or RAG), datasets, and learned tool embeddings.
- Index both agent-centric state and external knowledge (e.g., PubMed, CELLxGENE), providing relevant context on demand (Li et al., 9 Oct 2025, Wei et al., 29 Nov 2025).

In multi-agent settings, subagents may specialize in task decomposition, literature analysis, data characterization, model selection, or experimental optimization, with a central "critic" or moderator enforcing consensus and guiding collaboration (Tang et al., 4 Aug 2025).

A formal description encapsulating these components is:

$J(\theta) = \mathbb{E}_{\tau\sim\pi_\theta}\left[\sum_{t=0}^{T}\gamma^t R(s_t, a_t)\right]$

where $\tau=(s_0,a_0,\dots,s_T,a_T)$ is a trajectory, and $\gamma$ is a discount factor (Li et al., 9 Oct 2025).

3. Methodologies, Benchmarks, and Task Suites

Agentic virtual cell systems are evaluated along a canonical set of biological tasks, using domain-specific datasets and metrics. Three major scientific tasks are emphasized (Li et al., 9 Oct 2025, Tang et al., 4 Aug 2025):

Cellular Representation: The construction of universal vector encodings for cell state based on multi-omic (e.g., scRNA-seq, ATAC-seq, CITE-seq, spatial) measurements. Metrics include Normalized Mutual Information (NMI), Accuracy, Precision, Recall, and Macro-F1 (Li et al., 9 Oct 2025).
Perturbation Prediction: Modeling cellular response to interventions such as gene knockouts, drug dosing, or CRISPR perturbations. Metrics include Mean Squared Error (MSE), Root MSE, False Discovery Proportion, Pearson/Spearman correlations (Tang et al., 4 Aug 2025).
Gene Regulation Inference: Discovery and inference of regulatory interactions using transcriptional and network benchmarks (e.g., BEELINE, geneRNIB, CausalBench). Metrics include Area Under PRC, Early Precision Ratio, and network enrichment (Li et al., 9 Oct 2025).

Agentic workflows often implement closed-loop experimental design, whereby the AI selects the next optimal intervention, evaluates outcomes, and iteratively refines its strategy (Bunne et al., 2024, Hu et al., 14 Oct 2025).

Extensive benchmark suites and pre-training corpora are used (e.g., CELLxGENE, NCBI GEO, ENA, ImmPort, Protein Data Bank, Gene Ontology) (Li et al., 9 Oct 2025, Tang et al., 4 Aug 2025).

4. Exemplar Implementations and Use Cases

Several agentic virtual cell models highlight the diversity of architectural patterns and application domains (Li et al., 9 Oct 2025, Dip et al., 9 Oct 2025, Wei et al., 29 Nov 2025):

CellForge: A multi-agent framework in which specialized LLM agents iteratively analyze raw single-cell data and research objectives, propose modeling strategies, collaboratively refine hypotheses and architectures, and output executable code for model training and inference. CellForge employs role-specialized agents (Data Analyst, Model Architect, Pathway Expert, Critic) operating within a coordinated message-passing protocol, consistently outperforming monolithic baselines on single-cell perturbation prediction tasks (Tang et al., 4 Aug 2025).
scAgents: A multi-agent system organizing end-to-end single-cell perturbation workflows into Planner, Processor, and Reporter subagents (Li et al., 9 Oct 2025).
OmniCellAgent: Generates multi-omic analysis and clinical recommendations, using RAG to ground outputs in literature and external datasets (Li et al., 9 Oct 2025).
GeneAgent: Implements self-verifying pipelines by coupling LLM action selection with hard constraints from biology (e.g., restricting gene set outputs to experimentally supported terms) (Li et al., 9 Oct 2025).
VCWorld: An agentic, white-box simulator that integrates structured biological knowledge graphs and uses LLM-driven chain-of-thought reasoning to predict and explain perturbation-induced signaling cascades. VCWorld maintains full traceability of mechanistic hypotheses, produces high-accuracy predictions, and achieves state-of-the-art benchmark performance while supporting explicit explanation and human evaluation (Wei et al., 29 Nov 2025).
LLM4Cell survey: Categorizes a wide array of agentic single-cell models into annotation agents, reasoning agents, and multi-agent orchestration systems, reporting strong annotation (>0.90 accuracy), perturbation modeling (Pearson correlation ≈ 0.8), and ontology-mapping F1 (0.82–0.88) (Dip et al., 9 Oct 2025).

5. Integration Across Scales, Modalities, and Knowledge

Agentic virtual cell models increasingly unify multi-scale and multimodal data, leveraging universal representations at molecular ( $z_m$ ), cellular ( $z_c$ ), and tissue ( $z_T$ ) levels (Bunne et al., 2024). Graph neural networks (GNNs), cross-modal Transformers, and ODE-driven latent state propagation architectures are employed to enforce consistency and enable inference across molecular, cellular, and tissue organization (Bunne et al., 2024, Hu et al., 14 Oct 2025).

Mechanisms for integrating external biological knowledge include:

Curation and use of structured databases (e.g., Gene Ontology, Reactome, STRING, DrugBank) for tool grounding and mechanistic constraint.
Retrieval-augmented generation (RAG) to couple literature mining and data-driven prediction (Wei et al., 29 Nov 2025, Li et al., 9 Oct 2025).
Operator grammars in latent space (Measurement, Lift/Project, Intervention) to structure cross-scale reasoning and intervention modeling (Hu et al., 14 Oct 2025).

Modularity is a guiding principle—tool libraries, universal representations, and decision operators are designed for extensibility and cross-compatibility (Hu et al., 14 Oct 2025).

6. Limitations, Challenges, and Recommendations

Open challenges for agentic virtual cell models include:

Scalability: Limitations in handling ultra-long genomic or multi-modal contexts and minimizing tool orchestration latency (Li et al., 9 Oct 2025).
Generalization: Poor transfer to unseen cell types, perturbations, or experimental conditions; need for robust, biologically meaningful evaluation environments (Li et al., 9 Oct 2025, Hu et al., 14 Oct 2025).
Reliability and Interpretability: Requirement for explicit uncertainty quantification, transparent attribution, and complete traceability of agent decisions beyond black-box prediction (Wei et al., 29 Nov 2025, Hu et al., 14 Oct 2025, Dip et al., 9 Oct 2025).
Reward Design and Safety: Reward functions often fail to capture biological utility, risking metric-gaming; autonomous planning in experimental contexts poses significant safety and ethical concerns, requiring reliable oversight (Li et al., 9 Oct 2025, Hu et al., 14 Oct 2025).
Flexibility and Reproducibility: Most frameworks lack composable, formal building blocks; there is a critical need for modular agent specification languages, reusable primitives, and closed-loop model/data integration (Pleyer, 15 Nov 2025).
Standardization: Deficits in universal benchmarks, cross-platform compatibility, and community standards for model specification, provenance, and reporting (Dip et al., 9 Oct 2025).

Recommendations include the adoption of building-block frameworks for agent, environment, and interaction kernel specification; community-driven open science standards; and explicit support for transparency and modularity in tooling (Pleyer, 15 Nov 2025, Bunne et al., 2024).

7. Outlook and Future Directions

Agentic virtual cell models unify methodologies from artificial intelligence (LLMs, reinforcement learning, GNNs), computational cell biology, and knowledge representation. Overcoming outstanding challenges in scalability, robustness to distributional shift, interpretability, and flexibility will drive the next generation of systems. Key priorities involve:

Active learning and causal inference integration within agentic closed loops (Hu et al., 14 Oct 2025).
Automated hypothesis generation and wet-lab feedback cycles (Tang et al., 4 Aug 2025).
Development of agentic benchmarks emphasizing multi-step planning, tool selection fidelity, and explanation quality (Dip et al., 9 Oct 2025).
Model-card, datasheet, and audit log protocols to safeguard ethical and reproducible deployment (Bunne et al., 2024, Dip et al., 9 Oct 2025).

Agentic virtual cell models constitute the leading edge of computational biology’s evolution toward fully autonomous scientific agents operating at the intersection of data, knowledge, and reasoning (Li et al., 9 Oct 2025, Bunne et al., 2024).