Soft Masked Mamba Diffusion Model for CT to MRI Conversion

Published 22 Jun 2024 in cs.CV | (2406.15910v1)

Abstract: Magnetic Resonance Imaging (MRI) and Computed Tomography (CT) are the predominant modalities utilized in the field of medical imaging. Although MRI capture the complexity of anatomical structures with greater detail than CT, it entails a higher financial costs and requires longer image acquisition times. In this study, we aim to train latent diffusion model for CT to MRI conversion, replacing the commonly-used U-Net or Transformer backbone with a State-Space Model (SSM) called Mamba that operates on latent patches. First, we noted critical oversights in the scan scheme of most Mamba-based vision methods, including inadequate attention to the spatial continuity of patch tokens and the lack of consideration for their varying importance to the target task. Secondly, extending from this insight, we introduce Diffusion Mamba (DiffMa), employing soft masked to integrate Cross-Sequence Attention into Mamba and conducting selective scan in a spiral manner. Lastly, extensive experiments demonstrate impressive performance by DiffMa in medical image generation tasks, with notable advantages in input scaling efficiency over existing benchmark models. The code and models are available at https://github.com/wongzbb/DiffMa-Diffusion-Mamba

Abstract PDF HTML Upgrade to Chat

Citations (1)

View on Semantic Scholar

Summary

The paper introduces DiffMa, a diffusion model using a State-Space Mamba to improve CT to MRI conversion.
Utilizing Spiral-Scan and soft masking, the model preserves spatial continuity and emphasizes critical tissue details.
Experimental results on the SynthRAD2023 dataset show superior SSIM performance compared to CNN, ViT, and similar models.

Soft Masked Mamba Diffusion Model for CT to MRI Conversion

The paper "Soft Masked Mamba Diffusion Model for CT to MRI Conversion" (2406.15910) introduces an innovative framework focused on improving the conversion of CT images to MRI using a diffusion model named Diffusion Mamba (DiffMa). This framework employs a State-Space Model (SSM) termed Mamba, enhancing efficiency in generating high-fidelity MRI images from CT scans. The model addresses limitations like spatial continuity and cross-sequence attention, pivotal for medical imaging tasks.

Diffusion Mamba Framework

DiffMa replaces traditional CNN and ViT backbones with the Mamba model, leveraging linear computational efficiency and a global receptive field. The introduction of the Spiral-Scan and soft masking mechanisms allows this model to better preserve spatial continuity and focus on significant patches, essential in MRI generation where capturing intricate details is crucial.

DiffMa proves superior in handling 2D spatial inputs through its Spiral-Scan approach, maintaining the continuity of scanned sequences. The framework integrates Mamba with a soft mask module, incorporating Cross-Sequence Attention, which further refines the focus on critical tissue areas for CT to MRI conversion.

Figure 1: The Diffusion Mamba (DiffMa) framework. Left: The overall framework of Diffusion. Middle: Details of Mamba blocks. Right: Details of Mamba with Spiral-Scan.

Key Components

Spiral-Scan Mechanism

The Spiral-Scan strategy allows Mamba to manage spatial inputs effectively by maintaining spatial continuity during the sequence scan. This mechanism prevents disruption of spatial integrity when handling 2D patches, thus enhancing the model's ability to generate highly detailed MRI images.

Figure 2: The 2D Image Spiral-Scan. Eight schemes are employed to maintain structural information continuity.

Soft Mask with Vision Embedder

DiffMa incorporates a soft mask facilitated by a Vision Embedder, which produces token-level weights to prioritize patches vital for MRI generation. This component bridges the gap by focusing on the cross-sequence variation, allowing the model to emphasize more critical areas, thereby refining the image quality output.

Figure 3: Visualization of patch significance from latent pelvic images indicating importance through circle size and darkness.

Computational Efficiency

By using SSMs, specifically the Mamba model, DiffMa achieves a linear complexity conducive to processing long sequences efficiently. In contrast with architectures based on CNNs and ViTs, DiffMa maintains a manageable computational overhead while retaining a global understanding, beneficial for producing high-quality medical images.

Experimental Results

The effectiveness of DiffMa was validated on the SynthRAD2023 dataset, focusing on converting CT to MRI for brain and pelvic scans. Compared to other models such as LDM, DiT, and various Mamba-based architectures, DiffMa demonstrated superior performance, particularly through SSIM metrics, highlighting its capability in maintaining structural integrity and detail.

(Figure 4 and Figure 5)

Figure 4: Visualizations of brain CT to MRI conversion. DiffMa outperformed in structural preservation.

Figure 5: Visualizations of pelvis CT to MRI conversion showcasing enhanced detail fidelity.

Conclusion

Diffusion Mamba represents a significant advancement in medical image synthesis, notably in CT to MRI conversion, by leveraging Mamba's efficient processing and innovative scan and attention mechanisms. Future directions may involve exploring more complex conditions and expanding the adaptability of the Mamba for diverse medical imaging tasks, potentially enhancing diagnostic methodologies through efficient, high-quality image generation.