Papers
Topics
Authors
Recent
2000 character limit reached

Mamba-based Entropy Model (MEM) Overview

Updated 3 January 2026
  • MEM is an adaptive multimodal fusion model that integrates a continuous-time state-space sequence model (Mamba) and graph neural networks (GNNs) through confidence-weighted, entropy-based adaptive fusion.
  • The architecture employs a dual-branch design where the GNN branch captures local spatial relationships and the Mamba branch models global contextual dependencies, ensuring semantically aligned representations.
  • Entropy-driven fusion dynamically adjusts branch contributions based on predictive uncertainty, enhancing robust representation learning and scalability for large-scale inputs.

The Mamba-based Entropy Model (MEM) designates a class of adaptive multimodal fusion schemes integrating Mamba, a continuous-time state-space sequence model, with complementary architectures—most notably Graph Neural Networks (GNNs)—via confidence-weighted entropy mechanisms. Conceived for representation learning in domains such as digital pathology, MEM achieves robust performance by dynamically allocating predictive weight between branches according to their relative output uncertainty. The paradigm defines an approach in which spatial (local) and contextual (global) cues are jointly exploited, with information flow governed by entropy-based adaptive fusion at multiple processing stages (Khan et al., 25 Sep 2025).

1. Architectural Overview

MEM, exemplified by the SlideMamba framework, comprises two parallel branches processing the same underlying data but emphasizing different dependency structures. The local branch leverages a GNN (GINConv variant) to model short-range spatial relationships, operationalizing node and edge features derived from tiled embeddings of the input (e.g., 224×224 pixel regions of whole-slide images). The global branch employs the Mamba architecture, which processes sequences of tile embeddings and positional encodings as a continuous-time state-space model (SSM), enabling scalable modeling of long-range dependencies. Both branches are configured to accept the same low-level embeddings, ensuring that the fusion mechanism operates on semantically aligned intermediate representations (Khan et al., 25 Sep 2025).

2. Local and Global Dependency Modeling

The GNN branch adopts a SlideGraph+-style approach utilizing Graph Isomorphism Network convolution (GINConv) layers. At layer kk, node features are updated as

%%%%1%%%%

where hv(k1)Rdh_v^{(k-1)} \in \mathbb{R}^d is the prior embedding, ϵ(k)\epsilon^{(k)} is a learnable scalar, N(v)\mathcal{N}(v) is the neighbor set for node vv, and MLP(k)\mathrm{MLP}^{(k)} is a two-layer feed-forward network. Batch normalization and dropout are applied after each layer for stabilization.

The Mamba branch defines the hidden state evolution as a discretized SSM:

h(t)=Ah(t)+Bx(t) y(t)=Ch(t),\begin{align*} h'(t) &= Ah(t) + B x(t) \ y(t) &= Ch(t), \end{align*}

with discretization

A=exp(ΔA),B=(ΔA)1(exp(ΔA)I)ΔB.\overline{A} = \exp(\Delta A), \qquad \overline{B} = (\Delta A)^{-1}(\exp(\Delta A) - I)\Delta B.

Crucially, Mamba introduces input-dependent Bt,Ct,ΔtB_t, C_t, \Delta_t, enabling dynamic, context-aware filtering. Internal computation employs a parallel scan implementation for efficient scaling to high-resolution inputs (Khan et al., 25 Sep 2025).

3. Entropy-Based Adaptive Fusion

At each MEM fusion block (SlideMambaBlock), the outputs from both branches yield softmax probability vectors y^SG,y^MambaΔC\hat{y}_{SG}, \hat{y}_{Mamba} \in \Delta^C for CC classes. Entropy H(y^)H(\hat{y}) is defined as

H(y^)=1logCc=1Cy^clogy^c,H(\hat{y}) = -\frac{1}{\log C} \sum_{c=1}^{C} \hat{y}_c \log \hat{y}_c,

measuring normalized predictive uncertainty. Each branch acquires a confidence weight w=1H(y^)w = 1 - H(\hat{y}). The relative confidence

α=wMambawSG+wMamba\alpha = \frac{w_{Mamba}}{w_{SG} + w_{Mamba}}

serves as the fusion coefficient, yielding the fused representation

Xfused=(1α)XSG+αXMamba.X_{fused} = (1 - \alpha) X_{SG} + \alpha X_{Mamba}.

This adaptive weighting mechanism prioritizes the branch with the greater predictive confidence and allows block-wise, sample-dependent modulation of information flow (Khan et al., 25 Sep 2025).

4. Training Objective and Inference

After stacking LL MEM fusion blocks, model outputs are globally mean-pooled yielding a fixed-dimensional slide embedding zRdz \in \mathbb{R}^d. A terminal two-layer MLP followed by softmax generates final classification probabilities y^slide\hat{y}_{slide}. The objective is the categorical cross-entropy:

(y^slide,ytrue)=c=1Cytrue,clogy^slide,c.\ell(\hat{y}_{slide}, y_{true}) = -\sum_{c=1}^{C} y_{true,c} \log \hat{y}_{slide,c}.

Because the fusion weights α\alpha are computed based on differentiable softmax outputs, gradient signals propagate both through branch-specific and fusion parameters. This results in dynamic optimization where lower-entropy (higher confidence) predictions exert greater influence on parameter updates, enabling robust adaptation to sample-wise informativeness of local vs global cues (Khan et al., 25 Sep 2025).

5. Empirical Results and Comparative Performance

The MEM was evaluated for gene fusion/mutation classification from whole-slide lung-cancer images, with comparative analysis against representative baselines. Performance on 1,114 whole-slide images (five-fold cross validation) is summarized:

Model PRAUC ROC AUC Sensitivity Specificity
MIL 0.491 ± 0.042 0.714 ± 0.036 0.442 ± 0.090 0.854 ± 0.016
Trans-MIL 0.390 ± 0.017 0.640 ± 0.031 0.297 ± 0.019 0.853 ± 0.031
SlideGraph+ 0.748 ± 0.091 0.733 ± 0.085 0.638 ± 0.092 0.750 ± 0.040
GAT-Mamba 0.703 ± 0.075 0.723 ± 0.070 0.712 ± 0.055 0.762 ± 0.120
Mamba-only 0.664 ± 0.063 0.660 ± 0.060 0.475 ± 0.071 0.875 ± 0.090
SlideMamba (MEM) 0.751 ± 0.050 0.738 ± 0.055 0.6625 ± 0.083 0.725 ± 0.094

MEM outperformed or matched single-branch and rigidly fused approaches across PRAUC, ROC AUC, sensitivity, and specificity. These results indicate that simple linear addition of GNN and Mamba outputs yields suboptimal results; adaptive entropy-based fusion confers resilience when informativeness varies between local and global contexts (Khan et al., 25 Sep 2025).

6. Significance and Broader Implications

The MEM framework establishes a robust mechanism for multi-view representation learning, especially where both local and global dependencies are essential to downstream prediction tasks. Entropy-driven adaptive fusion addresses the limitation of fixed fusion weights, enhancing generalization and reliability in heterogenous data regimes. The use of the Mamba state-space architecture ensures scalability to extremely large inputs (such as gigapixel slides), a critical property for computational pathology and other high-resolution domains (Khan et al., 25 Sep 2025). A plausible implication is applicability beyond pathology, wherever information exhibits hierarchical or spatially multiscale structure and uncertainty estimation is critical for sample-level adaptivity.

While MEM is demonstrated in the SlideMamba study, similar architectural principles—combining SSM-based (Mamba) and GNN models, mediated by entropy or uncertainty-based fusion—are applicable to other joint modality integration or compression pipelines. In point cloud compression, the MEGA-PCC system references MEM as an entropy modeling component for improving joint geometry-attribute bitrate allocation; however, details on MEM's model structure, loss, and experimental results in the compression domain are not present in the available excerpt (Hsieh et al., 27 Dec 2025). This suggests active cross-domain applicability and ongoing development of such entropy-guided multimodal fusion schemes.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Mamba-based Entropy Model (MEM).

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube