BioMamba: Biologically Optimized SSMs

Updated 10 December 2025

BioMamba is a suite of biologically optimized structured state-space models that leverage content-dependent gating and linear-time complexity to capture long-range dependencies in diverse biological data.
It integrates hybrid learning rules combining global error learning with local spike-timing-dependent plasticity, enhancing bioplausibility and energy efficiency.
BioMamba supports multiplex applications across transcriptomics, protein science, biosignals, and bioacoustics, achieving competitive performance with reduced computational costs.

BioMamba refers to a suite of methodologies and model architectures that build upon the Mamba family of structured state-space models (SSMs), optimized specifically for biological, biomedical, and bioacoustic domains. These models leverage the linear-time complexity, content-dependent gating, and inductive biases of SSMs, while also integrating domain-specific data representations and, in certain instances, bioplausible learning rules such as spike-timing-dependent plasticity (STDP). BioMamba advances state-of-the-art modeling in areas including biomedicine, single-cell omics, protein science, neural signal processing, and bioacoustics.

1. Architectural Foundations

The core architectural innovation in BioMamba derives from selective state-space modeling, notably embodied in the Mamba and Mamba2 frameworks. The standard Mamba block operates via:

$x_t = A(\theta_t)x_{t-1} + B(\theta_t)u_t\ y_t = C(\theta_t)x_t + D(\theta_t)u_t$

Here, $A(\theta_t)$ , $B(\theta_t)$ , $C(\theta_t)$ , and $D(\theta_t)$ are content-dependent gating matrices/functions, allowing these models to capture long-range dependencies with $O(n)$ temporal and memory complexity. The architecture is extensible to both unidirectional and bidirectional forms. For example, in the bidirectional setting (as in “Bi-Mamba”), input sequences are processed in both forward and reverse directions and subsequently fused via a learned gating mechanism, yielding enriched representations of biological sequences, gene expression vectors, or temporal signals (Qi et al., 22 Apr 2025).

A subset of BioMamba models implements a spiking neural front end (e.g., Bio-Inspired Mamba, “BIM”) wherein the hidden state is refined via leaky integrate-and-fire (LIF) neuron dynamics. This enables explicit spike-time coding and aligns model plasticity with neurophysiological rules (Qin, 17 Sep 2024).

2. Bioplausible and Hybrid Learning Rules

A distinguishing feature of certain BioMamba variants, especially BIM, is their hybrid synaptic update mechanism that combines Real-Time Recurrent Learning (RTRL) with STDP-like local updates. At each step, the eligibility trace $e_{ij}(t)=\partial x_i(t)/\partial w_{ij}$ is updated online, analogous to RTRL, while pre-/post-synaptic spike pairs drive local weight changes according to:

$\Omega_{ij} = \sum_{t_i^f}\sum_{t_j^f}W(t_i^f - t_j^f)$

where $W(\Delta t)$ follows the experimentally observed STDP window with exponentials for potentiation and depression. The final synaptic update is a convex combination:

$\Delta w_{ij} = \eta\big[\lambda \frac{\partial \mathcal L}{\partial w_{ij}} + (1-\lambda)\Omega_{ij}\big]$

This hybrid strategy ensures both global error learning and strictly local, biologically plausible synaptic adjustments, enabling temporal locality and facilitating deployment on neuromorphic hardware (Qin, 17 Sep 2024).

3. Data Modalities and Specialized Input Representations

BioMamba models are adapted to a broad spectrum of biological modalities:

Single-Cell Transcriptomics: Gene expression vectors are encoded using rank-based gene selection and biologically informed normalization. Bidirectional state-space updates (Bi-Mamba) enable efficient, scalable context modeling over large gene sets for millions of cells (Qi et al., 22 Apr 2025).
Biomedical Signal Processing: Spectro-temporal embedding strategies combine Fourier-based frequency patching and channel-wise temporal summaries (STE). Bidirectional Mamba blocks and sparse feed-forward layers yield efficient and robust biosignal sequence classifiers (Qian et al., 14 Mar 2025).
Protein Sequences: Protein-Mamba tokenizes sequences via a 20-residue vocabulary, employs self-supervised SSM-based pretraining, and fine-tunes on downstream regression/classification tasks directly relevant to protein function (Xu et al., 22 Sep 2024).
Bioacoustics: Audio frames extracted via convolutional front ends are fed into stacked Mamba2 layers for robust temporal modeling. Self-supervised masked frame prediction pretraining is utilized, analogous to HuBERT (Tang et al., 3 Dec 2025).
Biomedical Text Mining: Biomedical domain pretraining (PubMed, etc.) on the Mamba backbone achieves dramatically improved perplexity and downstream task accuracy via context-aware SSM blocks (Yue et al., 5 Aug 2024).

4. Learning Objectives and Optimization

BioMamba instantiates multiple pretraining and downstream objectives:

Autoregressive Language Modeling: Applied to both natural language (biomedical literature) and protein sequences; loss function is standard cross-entropy over token predictions.
Masked-Frame or Masked-Token Prediction: Used in audio and omics modalities to foster robust unsupervised representation learning (Tang et al., 3 Dec 2025).
Contrastive Objectives: Pathway-aware InfoNCE losses drive gene embeddings to separate functional modules in single-cell omics (Qi et al., 22 Apr 2025).
Classification and Regression Heads: Adapted for biomedically relevant downstream tasks (disease classification, fitness prediction, medical QA) with suitable metrics (accuracy, AUROC, Spearman's $\rho$ ).
Hybrid RTRL+STDP Online Optimization: As described above, ensuring both energy-efficient and bioplausible learning in spiking variants (Qin, 17 Sep 2024).

Optimization follows modern conventions, typically leveraging AdamW, mixed precision, learning rate schedules (warmup and decay), and context lengths tailored to the specific application domain.

5. Empirical Performance and Generality

BioMamba models show competitive or state-of-the-art performance across a range of biological machine learning tasks, as summarized in multiple peer-reviewed experiments:

Domain	Model Variant	Key Metrics / Results
Biomedical LM	BioMamba LM	Perplexity 2.93 (vs BioGPT 4535), 0.359 ACC on QA (Yue et al., 5 Aug 2024)
Single-cell RNA	GeneMamba (Bi-Mamba)	AvgBio: 0.8131 (best), superior batch integration (Qi et al., 22 Apr 2025)
Protein function	Protein-Mamba	GB1 $\rho$ : 0.706 (SOTA), multi-task wins (Xu et al., 22 Sep 2024)
Biosignals	BioMamba (STE+Bi-Mamba)	AUROC 93.79% on Alzheimer’s detection (Qian et al., 14 Mar 2025)
Bioacoustics	BioMamba2	40% lower VRAM, 30% higher throughput vs AVES (Tang et al., 3 Dec 2025)

Notably, BioMamba achieves dramatic memory and compute gains relative to Transformer baselines (often $>4\times$ ), with parameter counts reduced to $0.7$–$1.1$ million in biosignal applications and training/inference scaling linearly in sequence length (Qian et al., 14 Mar 2025, Tang et al., 3 Dec 2025).

6. Hardware Implementation and Energy Efficiency

Strict locality in updates and sparse memory footprints make BioMamba particularly well-suited for neuromorphic hardware. Event-driven synaptic storage and pulsed gating architectures allow on-chip deployment with energy consumption per synaptic update as low as $0.5$nJ—fourfold lower than LSTM equivalents—with further improvements under pruning, reaching sub-$30$nJ per thousand-neuron layer (Qin, 17 Sep 2024). Memory requirements scale with only the number of active synapses and eligibility traces, permitting sustainable large-scale biological modeling.

7. Impact and Outlook

BioMamba demonstrates that selective SSMs with biologically motivated objectives are broadly effective for high-dimensional, temporally structured biological data. Benefits include:

Temporal locality and scale-invariant sequence processing unlike BPTT-dependent architectures.
Bioplausibility enabling insights into neurobiology and compatibility with neuromorphic chips.
Generality across molecular, cellular, physiological, and ecological data modalities.
Interpretability via embedding structure that recovers biological groupings and functional modules.
Open-source implementations and reproducible research, facilitating community adoption and extension.

Ongoing directions include multimodal fusion (e.g. sequence plus structural data for proteins), richer contrastive and generative pretraining, dynamic sparsity/pruning methods, real-time streaming deployment, and uncertainty quantification for translational biomedical applications (Xu et al., 22 Sep 2024, Yue et al., 5 Aug 2024).