GenCellAgent: Agentic Image Segmentation

Updated 19 October 2025

GenCellAgent is a training-free cellular image segmentation framework that uses a trio of agents for planning, execution, and evaluation.
It dynamically routes images to specialist and generalist models through style similarity and persistent memory to handle heterogeneous modalities.
Its text-guided, self-evolving approach improves segmentation accuracy while reducing manual annotation effort in quantitative biology.

GenCellAgent is a generalizable, training-free cellular image segmentation framework introduced as a multi-agent system that coordinates specialist segmenters and generalist vision-LLMs through an iterative planner–executor–evaluator loop with integrated long-term memory. Its architecture and workflow resolve challenges in segmentation across heterogeneous modalities and shifting imaging conditions, provide personalized workflows, and enable annotation-free adaptation and text-guided segmentation of novel organelles. Experimental results show substantial accuracy improvements compared to both supervised and specialist models, and the framework’s agentic paradigm affords practical advantages for robust quantitative biology.

1. Architectural Foundation and Agent Loop

GenCellAgent organizes its pipeline into three interacting agents: Planner, Executor, and Evaluator. The system receives a target image ( $Img_{tar}$ ), a user query ( $\mathcal{Q}$ ), available tool descriptions ( $\mathcal{T}$ ), and a database of prior runs ( $\mathcal{D}$ ). The planner ( $\mathcal{A}_{plan}$ ) designs and updates a segmentation plan ( $P_t$ ) as follows: $P_{\mathrm{init}} = \mathcal{A}_{plan}(Img_{tar}, \mathcal{Q}, \mathcal{T}, \mathcal{D})$

$P_{t+1} = \mathcal{A}_{plan}(P_t, s_{t+1})$

with $s_{t+1}$ being intermediate feedback.

The execution agent ( $\mathcal{A}_{exe}$ ) selects and runs the segmentation tool, either a specialist model (CellPose, μSAM, ERNet, MitoNet) or a generalist vision–LLM (LISA, SegGPT), depending on image modality, style similarity, and task requirements. The evaluation agent ( $\mathcal{A}_{eval}$ ) applies criteria to assess segmentation outputs, providing an evaluation score ( $e_{score} \in [0, 100]$ ) and summary ( $e_{sum}$ ) via

$(e_{score}, e_{sum}) = \mathcal{A}_{eval}(Img_{seg}, \mathcal{C}, p_{eval}),$

where $\mathcal{C}$ is the criterion set and $p_{eval}$ is the evaluation prompt.

This agentic feedback loop orchestrates tool routing, iterative refinement, and memory updates.

2. Dynamic Tool Routing and Segmentation Strategy

GenCellAgent automatically routes input images to the most suitable segmentation model by computing style similarity—a form of nearest-neighbor clustering—among available references in the tool database. If the default specialist underperforms due to domain mismatch (for example, out-of-distribution image style or modality), the system retrieves style-similar references from long-term memory ( $\mathcal{M}$ ) as in-context exemplars for generalist models (SegGPT, LISA), enabling retraining-free adaptation. This dynamic reference-guided mechanism allows segmentation tools to generalize beyond their supervised training distributions and handle visual heterogeneity robustly.

3. Long-term Memory, Personalization, and Self-evolution

All segmentation runs are archived in a long-term memory structure ( $\mathcal{M}_t$ ), which records the inputs, tool choices, outputs, evaluation scores, and any expert edits or manual corrections: $\mathcal{M}_{t+1} = \mathcal{M}_t \cup \{(Img_{tar}, Mask_{tar})\}$ The planner aggregates current and historical run states ( $s_{current}$ , $s_{historical}$ ) to infer the preferred human-in-the-loop mode: $\ell_{HITL} = \mathcal{A}_{plan}(s_{current}, s_{historical})$ Through this persistent memory, GenCellAgent self-evolves: expert corrections are committed for future reference, enabling iterative refinement and customization of annotation workflows. Over successive uses, this process supports fully automatic, reference-guided, and interactive annotation modes, adjusting the automation level to match user preferences and reducing annotation effort.

4. Training-free Adaptation and Text-guided Segmentation

GenCellAgent is capable of segmenting novel subcellular structures for which no specialist models exist, by leveraging generalist vision–LLMs in a text-guided iterative refinement loop.

The execution agent (LISA) is prompted with a literature-derived textual description (e.g., “stacked membranes” for the Golgi apparatus).
The evaluation agent scores the output mask and generates feedback.
The system refines the segmentation prompt using a function $\phi_{ref}$ :

$\{p_{seg}^{(i+1,n)}\}_{n=1}^N = \phi_{ref}(p_{seg}^i, e_{sum}^i)$

and selects the candidate prompt yielding the highest evaluation score for the next iteration.

Human correction can be optionally introduced to steer the prompt-update, facilitating segmentation of previously unsupported or undefined cellular targets.

This iterative, prompt-driven process enables the agent to adapt to novel tasks without retraining specialist networks or requiring large annotated datasets.

5. Quantitative Benchmarks and Performance Gains

Empirical evaluation across four segmentation benchmarks (LiveCell, TissueNet, PlantSeg, Lizard) demonstrates the effectiveness of GenCellAgent’s agentic routing and adaptation strategies:

Mean accuracy gain of 15.7% over state-of-the-art supervised or specialist models, validated as the agent clusters images by style and routes them to the optimal segmentation tool (e.g., CellPose for LiveCell, μSAM for TissueNet).
For organelle segmentation on endoplasmic reticulum and mitochondria in new, out-of-distribution datasets, average Intersection over Union (IoU) is improved by 37.6% versus dedicated specialist models. Figures in the primary publication substantiate both qualitative mask recovery and quantitative accuracy when initial tool performance falters.
For novel targets such as the Golgi apparatus, iterative text-guided refinement and expert correction yield robust segmentation without the need for custom model training.

This suggests that GenCellAgent’s multi-agent approach provides a substantial advantage in heterogeneous and low-annotation settings and supports generalization far beyond the supervised training domain of existing tools.

6. Implications for Workflow Efficiency and Quantitative Biology

GenCellAgent provides practical benefits in quantitative biology:

Annotation and retraining burden are reduced by deploying reference-guided routing and adaptation; retraining is no longer required when imaging conditions change or new modalities arise.
Long-term memory and expert edit integration enable self-evolving and personalized segmentation workflows, matching the user’s historical preferences and annotation patterns.
The agentic design allows for incremental improvement of segmentation quality—even for novel structures—with only light human input.

A plausible implication is that such systems may establish new standards for segmentation reliability and flexibility in biomedical research, especially in scenarios requiring rapid adaptation or expert-driven customization.

7. Limitations and Future Directions

While GenCellAgent achieves robust routing, adaptation, and text-guided generalization, several limitations remain:

Performance for targets with ambiguous or poorly defined textual features may still require substantial human guidance.
The reliance on long-term memory for expert corrections suggests that cold-start scenarios could benefit from further augmentation of reference databases or integration of unsupervised learning.
Integration with downstream analysis workflows (e.g., morphometric quantification, spatial transcriptomics) is plausible but not present in the current version.

Continued research may address these aspects by expanding the agentic memory paradigm, developing more sophisticated prompt-refinement algorithms, and coupling segmentation outputs with automated downstream interpretation.

GenCellAgent represents a generalizable, training-free cellular image segmentation framework that fuses planner–executor–evaluator agentic reasoning, dynamic specialist-generalist tool routing, in-context adaptation, and persistent memory for self-evolution and personalized annotation workflows. Its demonstrated quantitative gains and flexibility illustrate a robust solution for cellular image analysis across diverse conditions and tasks, with implications for efficiency and reliability in quantitative biology (Yu et al., 14 Oct 2025).

PDF Markdown Chat (Pro)

References (1)

GenCellAgent: Generalizable, Training-Free Cellular Image Segmentation via Large Language Model Agents (2025)

Follow Topic

Get notified by email when new papers are published related to GenCellAgent.