CellMamba: Linear SSMs for Biomedical Imaging
- CellMamba is a scalable neural framework that combines structured state-space models with attention mechanisms to efficiently analyze cell-level data.
- It enhances performance through ensemble methods, multi-scale fusion, and adaptive gating, optimizing both classification and dense cell detection.
- Validated on datasets like Chula-WBC-8 and CoNSeP, CellMamba achieves high accuracy and efficiency, underscoring its potential in biomedical imaging.
CellMamba refers to a set of scalable, state-space-model–based neural architectures that leverage Mamba blocks—structured state-space models (SSMs) with linear complexity—integrated with self-attention or attention-like mechanisms to achieve accurate, efficient cell-level analysis in biomedical images. Across recent literature, "CellMamba" denotes frameworks targeting either classification (including radiomics-driven tumor or white blood cell analysis) or dense object detection, united by the core principle of replacing quadratic-cost attention with linear-cost SSMs while incorporating additional mechanisms for local detail, multi-scale fusion, and interpretability (Clifton et al., 15 Apr 2025, Chen et al., 21 Nov 2025, Liu et al., 25 Dec 2025).
1. Foundational Principles
CellMamba architectures are built upon structured state-space models (S4) that generalize recurrent neural networks to efficiently model long-range dependencies in sequence- or image-like data. The canonical Mamba block maintains a latent state evolving as
with , , , learnable. In discrete-time form, the entire computation is equivalent to a causal convolution kernel parametrized via HiPPO transforms: (Clifton et al., 15 Apr 2025).
To capture local context and selection, the SSM output is further processed by a 1D causal convolution and added to a normalized activation:
Training typically uses weighted cross-entropy (for handling class imbalance) and -regularization, with optimization via AdamW. Per-block computational complexity is for input sequence length and hidden dimension , enabling high-resolution scaling on modest hardware (Clifton et al., 15 Apr 2025, Chen et al., 21 Nov 2025, Liu et al., 25 Dec 2025).
2. CellMamba for Classification: Architectures and Methodologies
Two distinct lines have developed under the "CellMamba" umbrella:
2.1. Ensemble Mamba for White Blood Cell (WBC) Classification
CellMamba applies an ensemble (stacked generalization) of five core Mamba variants (ViM, VMamba, MambaVision, LocalMamba, MedMamba), each producing per-sample softmax outputs. The meta-learner is a two-layer MLP trained via stacking on held-out samples:
This architecture enables robust classification with improved generalization and more effective handling of data imbalance. Weighted cross-entropy loss is deployed with inverse-frequency class weights (Clifton et al., 15 Apr 2025).
2.2. Unified Attention-Mamba (UAM) Backbone for Cell Radiomics
To address cell-level radiomics (shape, intensity, and texture features), the UAM backbone fuses Mamba and attention in each layer. The Amamba encoder employs cross-attention with values derived from a lightweight Mamba branch:
The Amamba-MoE layer concatenates self-attention and Mamba outputs, passing through a sparsely-gated mixture-of-experts. The full UAM model stacks both encoder types and, in its multimodal form, integrates radiomics with image embeddings for joint classification and segmentation (Chen et al., 21 Nov 2025).
3. CellMamba for Cell Detection: Adaptive and Lightweight Architectures
CellMamba for detection uses a VSSD backbone where standard convolutions or attention are replaced by NC-Mamba blocks for efficient long-range spatial modeling. In each stage:
- A CellMamba Block couples sequence modeling (NC-Mamba in early stages, MSA in later) with the Triple-Mapping Adaptive Coupling (TMAC) module.
- TMAC splits channels into two sub-paths, producing idiosyncratic and consensus attention maps per branch. Adaptive gating based on epoch controls the fusion of global and local context.
- An Adaptive Mamba Head aggregates FPN multi-scale features using learned weights , applying a final CellMamba block for robust detection across varied cell sizes (Liu et al., 25 Dec 2025).
The model realizes one-stage detection with 14.7M parameters, 1.6 ms inference per 256×256 patch, and sublinear scaling with input size, outperforming prior CNN, Transformer, and Mamba-based detectors in both accuracy (25.7% mAP @50 on CoNSeP; 53.3% @50 on CytoDArk0) and efficiency (Liu et al., 25 Dec 2025).
Representative Model Variants in Classification, Segmentation, Detection
| Subfield | Core CellMamba Variant | Distinctive Methodology |
|---|---|---|
| WBC Classification | Ensemble of diverse Mambas + MLP | Stacking, weighted loss |
| Tumor Cell Classification | UAM (Amamba, Amamba-MoE fusions) | Attention-SSM blend, radiomics |
| Cell Detection | VSSD backbone + CellMamba Block + TMAC | TMAC, Adaptive Mamba Head |
4. Datasets and Evaluation Protocols
WBC Classification is benchmarked using Chula-WBC-8: 6,215 images (4,808 original + 1,607 augmented), distributed across 8 WBC subtypes and extensively augmented (rotation, translation, color jitter, etc.). Training uses 224×224 pixel, normalized inputs, with AdamW, batch size 32, and 50–100 epochs for base models (Clifton et al., 15 Apr 2025).
Tumor Cell Classification and Segmentation employ datasets WSSS4LUAD, IGNITE, and TCGA Normal, comprising over 600,000 labeled cells across training, validation, and test splits. Integrative feature sets leverage 106 radiomics features per cell (Chen et al., 21 Nov 2025).
Cell Detection is evaluated on CoNSeP (colorectal adenocarcinoma, H&E, 128×128 patches) and CytoDArk0 (brain, Nissl, 256×256), with strong augmentation protocols and mAP, precision, recall metrics (Liu et al., 25 Dec 2025).
5. Performance, Efficiency, and Comparative Results
Classification and Segmentation
On BloodMNIST and Chula-WBC-8, CellMamba ensemble outperforms major baselines:
- CellMamba achieves 99.24% accuracy and F1=0.9925 on BloodMNIST, exceeding Swin ViT and all single Mamba variants.
- On Chula-WBC-8, performance is 93.94% accuracy and F1=0.9397, improving over DI-60 and transformer baselines; off-diagonal confusion matrix errors are reduced by ~30%, particularly in closely related classes (e.g., SNE↔BNE) (Clifton et al., 15 Apr 2025).
- The UAM backbone, using cell radiomics, yields 78% accuracy on IGNITE (vs. 74% for image-only baselines), and achieves up to 92.1% accuracy on WSSS4LUAD, outperforming transformer, Mamba, and hybrid models; segmentation precision rises from 75% (BiomedParse) to 80.8%, with strong cross-dataset generalization (AUC up to 96.1%) (Chen et al., 21 Nov 2025).
Detection
CellMamba sets state-of-the-art results on both CoNSeP and CytoDArk0—25.7% mAP @50 on CoNSeP, 53.3% @50 on CytoDArk0, with substantial improvements over CNN and previous Mamba-based detectors (e.g., Mask R-CNN, Mamba-YOLO-Base). The model achieves 84.4% F1 on CytoDArk0 and runs at 1.6 ms per input patch, outperforming approaches of dramatically larger size and latency (Liu et al., 25 Dec 2025).
6. Scalability, Limitations, and Prospective Directions
CellMamba’s reliance on linear-complexity SSMs enables image sizes and sequence lengths infeasible for classical attention-based models; for 224×224 resolution (L=196, d≈768), a per-layer cost advantage over Swin ViT is ≈196×. Training speed is improved by 20–30%, with inference throughput of ~150 FPS on a 12GB RTX 4070 at batch size 32 (, ) (Clifton et al., 15 Apr 2025).
Advantages include:
- Scalable, resource-efficient modeling suited to high-resolution and large-volume biomedical image analysis.
- Linear-time architectures make CellMamba suitable for resource-constrained and large-scale applications.
- Flexible mechanisms (ensembling, TMAC, multimodal heads) optimize for robustness, interpretability, and multi-task capacity (Clifton et al., 15 Apr 2025, Chen et al., 21 Nov 2025, Liu et al., 25 Dec 2025).
Limitations include dependence on the quality of preprocessing (mask generation, feature extraction), fixed gating schedules in TMAC, and challenges in scaling to whole-slide gigapixel images. Research directions focus on dynamic coupling, hierarchical/memory-efficient tiling, lightweight MoE, self-supervised pretraining, and extension to large-scale pretraining or prompt-based, interactive clinical workflows (Chen et al., 21 Nov 2025, Liu et al., 25 Dec 2025).
7. Domain Implications and Outlook
CellMamba architectures demonstrate that integrating linear-complexity SSMs (Mamba), tailored attention mechanisms, and model ensembling delivers top performance in fine-grained cell classification and dense object detection, with significant reductions in computational load. These approaches directly address the specific demands of biomedical images: extreme object density, subtle inter-class boundaries, and the need for population-scale interpretability and efficiency. A plausible implication is that CellMamba-style architectures may establish new paradigms for cellular analytics at scale, with applications extending from hematopathology to large-scale cancer phenotyping and beyond (Clifton et al., 15 Apr 2025, Chen et al., 21 Nov 2025, Liu et al., 25 Dec 2025).