Papers
Topics
Authors
Recent
2000 character limit reached

HiFi-MambaV2: Hierarchical MoE MRI Reconstruction

Updated 30 November 2025
  • HiFi-MambaV2 is a hierarchical shared-routed Mixture-of-Experts architecture that reconstructs high-fidelity MR images from undersampled k-space data.
  • It integrates a frequency-consistent Laplacian pyramid and SE-guided global context to enhance high-frequency details and maintain anatomical coherence.
  • The system employs a physics-informed unrolled iterative framework with strict data consistency, achieving superior metrics across multiple MRI datasets.

HiFi-MambaV2 is a hierarchical shared-routed Mixture-of-Experts (MoE) deep learning architecture designed for reconstructing high-fidelity MR images from undersampled k-space data. It integrates a frequency-consistent Laplacian pyramid decomposition, content-adaptive computation via MoE, and physics-informed data consistency within an unrolled iterative framework. HiFi-MambaV2 targets the preservation of high-frequency image details and anatomical coherence in accelerated MRI, consistently surpassing CNN, Transformer, and prior Mamba-based baselines on established reconstruction metrics across multiple datasets and acquisition protocols (Fang et al., 23 Nov 2025).

1. Unrolled MRI Reconstruction Framework

HiFi-MambaV2 adopts a physics-informed unrolled architecture, where the MRI inverse problem is represented as yc=Fu(Scx)+ncy_c = F_u(S_c x) + n_c, with xCH×Wx\in \mathbb{C}^{H\times W} the unknown image, ycy_c the undersampled k-space, ScS_c coil-sensitivity maps, and FuF_u the undersampling Fourier operator. Reconstruction employs TT unrolled stages called HiFi-Mamba Groups. At each stage tt, the iterative process consists of a data-consistency (DC) update: x(t)=x(t)λcScHFuH(Fu(Scx(t))yc)x^{(t)'} = x^{(t)} - \lambda \sum_c S_c^H F_u^H(F_u(S_c x^{(t)}) - y_c) followed by a learned refinement DθD_\theta, which comprises eight cascaded Groups. Each Group embeds two HiFi-Mamba Units and an explicit DC block (Fang et al., 23 Nov 2025).

Each HiFi-Mamba Unit performs the following operations sequentially: global context enhancement, frequency decomposition, content-adaptive MoE processing, and feature fusion. This hierarchical organization supports progressive refinement of image features and strict regularization via explicit data-consistency enforcement.

2. Separable Frequency-Consistent Laplacian Pyramid (SF-Lap)

The SF-Lap module decomposes features xRH×W×Cx \in \mathbb{R}^{H\times W\times C} into low- and high-frequency bands using a depthwise-separable, 5-tap binomial kernel k=[1,4,6,4,1]/16k = [1, 4, 6, 4, 1]/16, minimizing aliasing and checkerboard artifacts. The process involves:

  • Reflective padding Pref(x)P_{\text{ref}}(x).
  • Horizontal and vertical depthwise convolutions with stride s=2s=2 on the vertical pass for decimation:

L=Convv(s=2)(Convh(s=1)(Pref(x),k),k)L = \text{Conv}_v^{(s=2)}(\text{Conv}_h^{(s=1)}(P_{\text{ref}}(x), k), k)

  • Bilinear upsampling U×2U_{\times 2} to revert to input resolution.
  • High-frequency residual computation:

L~=U×2(L),H=xL~\tilde{L} = U_{\times 2}(L),\quad H = x - \tilde{L}

This decomposition yields LL (low-frequency) and HH (high-frequency) streams that are recombined to preserve signal energy and mitigate alias-related instabilities during learning (Fang et al., 23 Nov 2025).

3. SE-Guided Global Context Enhancement

Parallel to SF-Lap, HiFi-MambaV2 employs a lightweight Squeeze-and-Excitation (SE)-guided path for channel-wise global context aggregation. Global average pooling yields zRCz \in \mathbb{R}^{C}, followed by a two-layer gating network: s=σ(W2ReLU(W1z))s = \sigma(W_2 \text{ReLU}(W_1 z)) with learnable parameters W1RC×(C/r)W_1\in \mathbb{R}^{C\times (C/r)}, W2R(C/r)×CW_2\in \mathbb{R}^{(C/r)\times C}, and reduction ratio r=16r=16. The channel attention weights ss modulate the input as x=xs+xx' = x \odot s + x, introducing global anatomical context to feature maps prior to spatially localized operations (Fang et al., 23 Nov 2025).

4. Hierarchical Shared-Routed Mixture-of-Experts Mechanism

HiFi-MambaV2 concatenates global-context features xx', SF-Lap low-frequency LL, and high-frequency HH components along channels, forming XRB×H×W×CX \in \mathbb{R}^{B\times H\times W\times C}. This input is processed through a MoE structure comprising:

  • Shared Experts: {Ei(sh)}i=1Ns\{E_i^{(\text{sh})}\}_{i=1}^{N_s}, applied universally across the spatial domain.
  • Routed Experts: {Ee(rt)}e=1Nr\{E_e^{(\text{rt})}\}_{e=1}^{N_r}, assigned per pixel by a lightweight router.

The router computes logits G=Softmax(WrX)G = \text{Softmax}(W_r X) and applies top-1 sparsity: a single expert per spatial location is selected. The composite MoE output is: Y=Ysh+YrtY = Y_{\text{sh}} + Y_{\text{rt}} where YshY_{\text{sh}} is the sum of shared experts and YrtY_{\text{rt}} is the routed expert response. A load-balancing regularization term Lbal=Nre=1Nrpe2L_{\text{bal}} = N_r \sum_{e=1}^{N_r} p_e^2 (with pep_e the average routing probability) is used to stabilize expert utilization (Fang et al., 23 Nov 2025).

5. Data-Consistency Regularization and Optimization

HiFi-MambaV2 imposes strict data fidelity at every unrolled stage by solving: minxFu(Sx)y22+12λxx(t)22\min_x\,\, \|F_u(Sx) - y\|_2^2 + \frac{1}{2\lambda} \|x - x^{(t)}\|_2^2 with the closed-form proximal update: xx+λcScHFuH(ycFu(Scx))x \leftarrow x + \lambda \sum_c S_c^H F_u^H(y_c - F_u(S_c x)) This ensures model outputs are consistent with acquired k-space samples at each iteration (Fang et al., 23 Nov 2025).

The model is trained using AdamW (initial LR =8×104=8\times 10^{-4}), with cosine-annealing, 5-epoch warm-up, and 100 total epochs, on NVIDIA H100 GPUs. MoE hyperparameters include Ns=2N_s = 2 shared and Nr=4N_r = 4 routed experts, top-1 sparsity, and an SE reduction ratio of r=16r=16. Acceleration factors and k-space masks include 4×/8×, equispaced Cartesian, random Cartesian, and golden-angle radial (Fang et al., 23 Nov 2025).

6. Experimental Validation and Ablations

HiFi-MambaV2 demonstrates superior quantitative performance in PSNR (dB), SSIM, and NMSE across multiple datasets and sampling scenarios:

Dataset (Coil Type, Mask) 4× PSNR / SSIM / NMSE 8× PSNR / SSIM / NMSE
fastMRI (single-coil) 34.86 / 0.855 / 0.010 31.89 / 0.764 / 0.019
CC359 (single-coil) 37.21 / 0.957 / 0.003 29.05 / 0.849 / 0.020
Prostate158 (random) 29.04 / 0.837 / 0.012 23.91 / 0.662 / 0.039
ACDC (radial, single-coil) 33.72 / 0.931 / 0.005 28.05 / 0.820 / 0.017
M4Raw (multi-coil) 31.82 / 0.794 / 0.016 29.91 / 0.747 / 0.024

Ablation studies on CC359 at 8× show each module contributes complementary performance gains:

  • Baseline HiFi-Mamba: 28.08 / 0.802 / 0.027
    • SF-Lap: 28.31 / 0.830 / 0.024
    • SF-Lap + LSGP: 28.43 / 0.835 / 0.023
    • SF-Lap + LSGP + MoE-balanced: 28.50 / 0.837 / 0.023
  • Full (SF-Lap + LSGP + MoE): 28.68 / 0.841 / 0.022

This demonstrates that frequency decomposition, global reasoning, and adaptive specialization each improve high-frequency detail recovery and overall fidelity (Fang et al., 23 Nov 2025).

7. Limitations and Future Directions

Although HiFi-MambaV2 increases theoretical FLOPs due to deeper cascades and MoE specialization, sparse routing and depthwise-separable convolutions yield efficient inference, with wall-clock latencies comparable to or better than Transformer alternatives. Open challenges include minimizing computation and memory overhead for real-time clinical deployment, extending the routing mechanism to non-Cartesian k-space and calibrationless settings, and incorporating advanced physics or diffusion-based priors to address extreme undersampling or motion. Potential directions also include cross-contrast and multi-modal expert sharing for improved generalization across anatomical domains (Fang et al., 23 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to HiFi-MambaV2.