HiFi-MambaV2: Hierarchical MoE MRI Reconstruction
- HiFi-MambaV2 is a hierarchical shared-routed Mixture-of-Experts architecture that reconstructs high-fidelity MR images from undersampled k-space data.
- It integrates a frequency-consistent Laplacian pyramid and SE-guided global context to enhance high-frequency details and maintain anatomical coherence.
- The system employs a physics-informed unrolled iterative framework with strict data consistency, achieving superior metrics across multiple MRI datasets.
HiFi-MambaV2 is a hierarchical shared-routed Mixture-of-Experts (MoE) deep learning architecture designed for reconstructing high-fidelity MR images from undersampled k-space data. It integrates a frequency-consistent Laplacian pyramid decomposition, content-adaptive computation via MoE, and physics-informed data consistency within an unrolled iterative framework. HiFi-MambaV2 targets the preservation of high-frequency image details and anatomical coherence in accelerated MRI, consistently surpassing CNN, Transformer, and prior Mamba-based baselines on established reconstruction metrics across multiple datasets and acquisition protocols (Fang et al., 23 Nov 2025).
1. Unrolled MRI Reconstruction Framework
HiFi-MambaV2 adopts a physics-informed unrolled architecture, where the MRI inverse problem is represented as , with the unknown image, the undersampled k-space, coil-sensitivity maps, and the undersampling Fourier operator. Reconstruction employs unrolled stages called HiFi-Mamba Groups. At each stage , the iterative process consists of a data-consistency (DC) update: followed by a learned refinement , which comprises eight cascaded Groups. Each Group embeds two HiFi-Mamba Units and an explicit DC block (Fang et al., 23 Nov 2025).
Each HiFi-Mamba Unit performs the following operations sequentially: global context enhancement, frequency decomposition, content-adaptive MoE processing, and feature fusion. This hierarchical organization supports progressive refinement of image features and strict regularization via explicit data-consistency enforcement.
2. Separable Frequency-Consistent Laplacian Pyramid (SF-Lap)
The SF-Lap module decomposes features into low- and high-frequency bands using a depthwise-separable, 5-tap binomial kernel , minimizing aliasing and checkerboard artifacts. The process involves:
- Reflective padding .
- Horizontal and vertical depthwise convolutions with stride on the vertical pass for decimation:
- Bilinear upsampling to revert to input resolution.
- High-frequency residual computation:
This decomposition yields (low-frequency) and (high-frequency) streams that are recombined to preserve signal energy and mitigate alias-related instabilities during learning (Fang et al., 23 Nov 2025).
3. SE-Guided Global Context Enhancement
Parallel to SF-Lap, HiFi-MambaV2 employs a lightweight Squeeze-and-Excitation (SE)-guided path for channel-wise global context aggregation. Global average pooling yields , followed by a two-layer gating network: with learnable parameters , , and reduction ratio . The channel attention weights modulate the input as , introducing global anatomical context to feature maps prior to spatially localized operations (Fang et al., 23 Nov 2025).
4. Hierarchical Shared-Routed Mixture-of-Experts Mechanism
HiFi-MambaV2 concatenates global-context features , SF-Lap low-frequency , and high-frequency components along channels, forming . This input is processed through a MoE structure comprising:
- Shared Experts: , applied universally across the spatial domain.
- Routed Experts: , assigned per pixel by a lightweight router.
The router computes logits and applies top-1 sparsity: a single expert per spatial location is selected. The composite MoE output is: where is the sum of shared experts and is the routed expert response. A load-balancing regularization term (with the average routing probability) is used to stabilize expert utilization (Fang et al., 23 Nov 2025).
5. Data-Consistency Regularization and Optimization
HiFi-MambaV2 imposes strict data fidelity at every unrolled stage by solving: with the closed-form proximal update: This ensures model outputs are consistent with acquired k-space samples at each iteration (Fang et al., 23 Nov 2025).
The model is trained using AdamW (initial LR ), with cosine-annealing, 5-epoch warm-up, and 100 total epochs, on NVIDIA H100 GPUs. MoE hyperparameters include shared and routed experts, top-1 sparsity, and an SE reduction ratio of . Acceleration factors and k-space masks include 4×/8×, equispaced Cartesian, random Cartesian, and golden-angle radial (Fang et al., 23 Nov 2025).
6. Experimental Validation and Ablations
HiFi-MambaV2 demonstrates superior quantitative performance in PSNR (dB), SSIM, and NMSE across multiple datasets and sampling scenarios:
| Dataset (Coil Type, Mask) | 4× PSNR / SSIM / NMSE | 8× PSNR / SSIM / NMSE |
|---|---|---|
| fastMRI (single-coil) | 34.86 / 0.855 / 0.010 | 31.89 / 0.764 / 0.019 |
| CC359 (single-coil) | 37.21 / 0.957 / 0.003 | 29.05 / 0.849 / 0.020 |
| Prostate158 (random) | 29.04 / 0.837 / 0.012 | 23.91 / 0.662 / 0.039 |
| ACDC (radial, single-coil) | 33.72 / 0.931 / 0.005 | 28.05 / 0.820 / 0.017 |
| M4Raw (multi-coil) | 31.82 / 0.794 / 0.016 | 29.91 / 0.747 / 0.024 |
Ablation studies on CC359 at 8× show each module contributes complementary performance gains:
- Baseline HiFi-Mamba: 28.08 / 0.802 / 0.027
- SF-Lap: 28.31 / 0.830 / 0.024
- SF-Lap + LSGP: 28.43 / 0.835 / 0.023
- SF-Lap + LSGP + MoE-balanced: 28.50 / 0.837 / 0.023
- Full (SF-Lap + LSGP + MoE): 28.68 / 0.841 / 0.022
This demonstrates that frequency decomposition, global reasoning, and adaptive specialization each improve high-frequency detail recovery and overall fidelity (Fang et al., 23 Nov 2025).
7. Limitations and Future Directions
Although HiFi-MambaV2 increases theoretical FLOPs due to deeper cascades and MoE specialization, sparse routing and depthwise-separable convolutions yield efficient inference, with wall-clock latencies comparable to or better than Transformer alternatives. Open challenges include minimizing computation and memory overhead for real-time clinical deployment, extending the routing mechanism to non-Cartesian k-space and calibrationless settings, and incorporating advanced physics or diffusion-based priors to address extreme undersampling or motion. Potential directions also include cross-contrast and multi-modal expert sharing for improved generalization across anatomical domains (Fang et al., 23 Nov 2025).