I2I-Mamba: Multi-modal medical image synthesis via selective state space modeling (2405.14022v5)
Abstract: Multi-modal medical image synthesis involves nonlinear transformation of tissue signals between source and target modalities, where tissues exhibit contextual interactions across diverse spatial distances. As such, the utility of a network architecture in synthesis depends on its ability to express these contextual features. Convolutional neural networks (CNNs) offer high local precision at the expense of poor sensitivity to long-range context. While transformers promise to alleviate this issue, they suffer from an unfavorable trade-off between sensitivity to long- versus short-range context due to the intrinsic complexity of attention filters. To effectively capture contextual features while avoiding the complexity-driven trade-offs, here we introduce a novel multi-modal synthesis method, I2I-Mamba, based on the state space modeling (SSM) framework. Focusing on semantic representations across a hybrid residual architecture, I2I-Mamba leverages novel dual-domain Mamba (ddMamba) blocks for complementary contextual modeling in image and Fourier domains, while maintaining spatial precision with convolutional layers. Diverting from conventional raster-scan trajectories, ddMamba leverages novel SSM operators based on a spiral-scan trajectory to learn context with enhanced radial coverage and angular isotropy, and a channel-mixing layer to aggregate context across the channel dimension. Comprehensive demonstrations on multi-contrast MRI and MRI-CT protocols indicate that I2I-Mamba offers superior performance against state-of-the-art CNNs, transformers and SSMs.
- B. Moraal, S. Roosendaal, P. Pouwels, H. Vrenken, R. Schijndel, D. Meier, C. Guttmann, J. Geurts, and F. Barkhof, “Multi-contrast, isotropic, single-slab 3d MR imaging in multiple sclerosis,” Eur. Radiol., vol. 18, pp. 2311–2320, 2008.
- B. Thukral, “Problems and preferences in pediatric imaging,” Indian J. Radiol. Imaging, vol. 25, pp. 359–364, 2015.
- K. Krupa and M. Bekiesińska-Figatowska, “Artifacts in magnetic resonance imaging,” Pol. J. Radiol., vol. 80, pp. 93–106, 2015.
- J. E. Iglesias, E. Konukoglu, D. Zikic, B. Glocker, K. Van Leemput, and B. Fischl, “Is synthesizing MRI contrast useful for inter-modality analysis?” in Med. Image Comput. Comput. Assist. Interv., 2013, pp. 631–638.
- Y. Huo, Z. Xu, S. Bao, A. Assad, R. G. Abramson, and B. A. Landman, “Adversarial synthesis learning enables segmentation without target modality ground truth,” in Int. Symp. Biomed. Imaging, 2018, pp. 1217–1220.
- D. H. Ye, D. Zikic, B. Glocker, A. Criminisi, and E. Konukoglu, “Modality propagation: Coherent synthesis of subject-specific scans with data-driven regularization,” in Med. Image Comput. Comput. Assist. Interv., 2013, pp. 606–613.
- C. Catana, A. van der Kouwe, T. Benner, C. J. Michel, M. Hamm, M. Fenchel, B. Fischl, B. Rosen, M. Schmand, and A. G. Sorensen, “Toward implementing an MRI-based PET attenuation-correction method for neurologic studies on the MR-PET brain prototype,” J. Nucl. Med., vol. 51, no. 9, pp. 1431–1438, 2010.
- S. Roy, A. Jog, A. Carass, and J. L. Prince, “Atlas based intensity transformation of brain MR images,” in Multimodal Brain Image Anal., 2013, pp. 51–62.
- J. Lee, A. Carass, A. Jog, C. Zhao, and J. Prince, “Multi-atlas-based CT synthesis from conventional MRI with patch-based refinement for MRI-based radiotherapy planning,” in SPIE Med. Imag., vol. 10133, 2017, p. 101331I.
- Y. Huang, L. Shao, and A. F. Frangi, “Simultaneous super-resolution and cross-modality synthesis of 3D medical images using weakly-supervised joint convolutional sparse coding,” Comput. Vis. Pattern Recognit., pp. 5787–5796, 2017.
- ——, “Cross-modality image synthesis via weakly coupled and geometry co-regularized joint dictionary learning,” IEEE Trans. Med. Imag., vol. 37, no. 3, pp. 815–827, 2018.
- C. Zhao, A. Carass, J. Lee, Y. He, and J. L. Prince, “Whole brain segmentation and labeling from CT using synthetic MR images,” in Mach. Learn. Med. Imag., 2017, pp. 291–298.
- A. Jog, A. Carass, S. Roy, D. L. Pham, and J. L. Prince, “Random forest regression for magnetic resonance image synthesis,” Med. Image. Anal., vol. 35, pp. 475–488, 2017.
- H. Van Nguyen, K. Zhou, and R. Vemulapalli, “Cross-domain synthesis of medical images using efficient location-sensitive deep network,” in Med. Image Comput. Comput. Assist. Interv., 2015, pp. 677–684.
- R. Vemulapalli, H. Van Nguyen, and S. K. Zhou, “Unsupervised cross-modal synthesis of subject-specific scans,” in Int. Conf. Comput. Vis., 2015, pp. 630–638.
- Y. Wu, W. Yang, L. Lu, Z. Lu, L. Zhong, M. Huang, Y. Feng, Q. Feng, and W. Chen, “Prediction of CT substitutes from MR images based on local diffeomorphic mapping for brain PET attenuation correction,” J. Nucl. Med., vol. 57, no. 10, pp. 1635–1641, 2016.
- D. C. Alexander, D. Zikic, J. Zhang, H. Zhang, and A. Criminisi, “Image quality transfer via random forest regression: Applications in diffusion MRI,” in Med. Image Comput. Comput. Assist. Interv., 2014, pp. 225–232.
- T. Huynh, Y. Gao, J. Kang, L. Wang, P. Zhang, J. Lian, and D. Shen, “Estimating CT image from MRI data using structured random forest and auto-context model,” IEEE Trans. Med. Imag., vol. 35, no. 1, pp. 174–183, 2016.
- P. Coupe, J. V. Manjón, M. Chamberland, M. Descoteaux, and B. Hiba, “Collaborative patch-based super-resolution for diffusion-weighted images,” NeuroImage, vol. 83, pp. 245–261, 2013.
- V. Sevetlidis, M. V. Giuffrida, and S. A. Tsaftaris, “Whole image synthesis using a deep encoder-decoder network,” in Simul. Synth. Med. Imaging, 2016, pp. 127–137.
- A. Chartsias, T. Joyce, M. V. Giuffrida, and S. A. Tsaftaris, “Multimodal MR synthesis via modality-invariant latent representation,” IEEE Trans. Med. Imag., vol. 37, no. 3, pp. 803–814, 2018.
- C. Bowles, C. Qin, C. Ledig, R. Guerrero, R. Gunn, A. Hammers, E. Sakka, D. Dickie, M. Hernández, N. Royle et al., “Pseudo-healthy image synthesis for white matter lesion segmentation,” in Simul. Synth. Med. Imaging, 2016, pp. 87–96.
- N. Cordier, H. Delingette, M. Le, and N. Ayache, “Extended modality propagation: Image synthesis of pathological cases,” IEEE Trans. Med. Imag., vol. 35, pp. 2598–2608, 2016.
- T. Joyce, A. Chartsias, and S. A. Tsaftaris, “Robust multi-modal MR image synthesis,” in Med. Image Comput. Comput. Assist. Interv., 2017, pp. 347–355.
- W. Wei, E. Poirion, B. Bodini, S. Durrleman, O. Colliot, B. Stankoff, and N. Ayache, “Fluid-attenuated inversion recovery MRI synthesis from multisequence MRI using three-dimensional fully convolutional networks for multiple sclerosis,” J. Med. Imaging, vol. 6, no. 1, p. 014005, 2019.
- A. Beers, J. Brown, K. Chang, J. Campbell, S. Ostmo, M. Chiang, and J. Kalpathy-Cramer, “High-resolution medical image synthesis using progressively grown generative adversarial networks,” arXiv:1805.03144, 2018.
- S. U. Dar, M. Yurt, L. Karacan, A. Erdem, E. Erdem, and T. Çukur, “Image synthesis in multi-contrast MRI with conditional generative adversarial networks,” IEEE Trans. Med. Imag., vol. 38, no. 10, pp. 2375–2388, 2019.
- B. Yu, L. Zhou, L. Wang, J. Fripp, and P. Bourgeat, “3D cGAN based cross-modality MR image synthesis for brain tumor segmentation,” Int. Symp. Biomed. Imaging, pp. 626–630, 2018.
- D. Nie, R. Trullo, J. Lian, L. Wang, C. Petitjean, S. Ruan, and Q. Wang, “Medical image synthesis with deep convolutional adversarial networks,” IEEE Trans. Biomed. Eng., vol. 65, no. 12, pp. 2720–2730, 2018.
- K. Armanious, C. Jiang, M. Fischer, T. Küstner, T. Hepp, K. Nikolaou, S. Gatidis, and B. Yang, “MedGAN: Medical image translation using GANs,” Comput. Med. Imaging Grap., vol. 79, p. 101684, 2019.
- D. Lee, J. Kim, W.-J. Moon, and J. C. Ye, “CollaGAN: Collaborative GAN for missing image data imputation,” in Comput. Vis. Pattern Recognit., 2019, pp. 2487–2496.
- H. Li, J. C. Paetzold, A. Sekuboyina, F. Kofler, J. Zhang, J. S. Kirschke, B. Wiestler, and B. Menze, “DiamondGAN: Unified multi-modal generative adversarial networks for MRI sequences synthesis,” in Med. Image Comput. Comput. Assist. Interv., 2019, pp. 795–803.
- T. Zhou, H. Fu, G. Chen, J. Shen, and L. Shao, “Hi-Net: Hybrid-fusion network for multi-modal MR image synthesis,” IEEE Trans. Med. Imag., vol. 39, no. 9, pp. 2772–2781, 2020.
- H. Lan, A. Toga, and F. Sepehrband, “SC-GAN: 3D self-attention conditional GAN with spectral normalization for multi-modal neuroimaging synthesis,” bioRxiv:2020.06.09.143297, 2020.
- M. Yurt, S. U. Dar, A. Erdem, E. Erdem, K. K. Oguz, and T. Çukur, “mustGAN: multi-stream generative adversarial networks for MR image synthesis,” Med. Image. Anal., vol. 70, p. 101944, 2021.
- H. Yang, X. Lu, S.-H. Wang, Z. Lu, J. Yao, Y. Jiang, and P. Qian, “Synthesizing multi-contrast MR images via novel 3D conditional variational auto-encoding GAN,” Mob. Netw. Appl., vol. 26, pp. 1–10, 2021.
- X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” in Comput. Vis. Pattern Recognit., 2018, pp. 7794–7803.
- N. Kodali, J. Hays, J. Abernethy, and Z. Kira, “On convergence and stability of GANs,” arXiv:1705.07215, 2017.
- O. Oktay, J. Schlemper, L. L. Folgoc, M. J. Lee, M. Heinrich, K. Misawa, K. Mori, S. G. McDonagh, N. Hammerla, B. Kainz et al., “Attention U-Net: Learning where to look for the pancreas,” arXiv:1804.03999, 2018.
- H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self-attention generative adversarial networks,” in Int. Conf. Mach. Learn., vol. 97, 2019, pp. 7354–7363.
- S. A. Kamran, K. F. Hossain, A. Tavakkoli, S. L. Zuckerbrod, K. M. Sanders, and S. A. Baker, “VTGAN: Semi-supervised retinal image synthesis and disease prediction using vision transformers,” arXiv:2104.06757, 2021.
- H.-C. Shin, A. Ihsani, S. Mandava, S. T. Sreenivas, C. Forster, J. Cha, and A. D. N. Initiative, “GANBERT: Generative adversarial networks with bidirectional encoder representations from transformers for MRI to PET synthesis,” arXiv:2008.04393, 2020.
- X. Zhang, X. He, J. Guo, N. Ettehadi, N. Aw, D. Semanek, J. Posner, A. Laine, and Y. Wang, “PTNet: A high-resolution infant MRI synthesizer based on transformer,” arXiv:2105.13993, 2021.
- O. Dalmaz, M. Yurt, and T. Çukur, “ResViT: Residual vision transformers for multi-modal medical image synthesis,” IEEE Trans Med Imaging, vol. 44, no. 10, pp. 2598–2614, 2022.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv:2010.11929, 2021.
- H. A. Bedel, I. Sivgin, O. Dalmaz, S. U. Dar, and T. Çukur, “Bolt: Fused window transformers for fmri time series analysis,” Med Image Anal, vol. 88, p. 102841, 2023.
- Z. Zhang, L. Yu, X. Liang, W. Zhao, and L. Xing, “TransCT: Dual-path transformer for low dose computed tomography,” in Med. Image Comput. Comput. Assist. Interv., 2021, pp. 55–64.
- L. Zhu, B. Liao, Q. Zhang, X. Wang, W. Liu, and X. Wang, “Vision mamba: Efficient visual representation learning with bidirectional state space model,” arXiv:2401.09417, 2024.
- Y. Liu, Y. Tian, Y. Zhao, H. Yu, L. Xie, Y. Wang, Q. Ye, and Y. Liu, “Vmamba: Visual state space model,” arXiv:2401.10166, 2024.
- J. Ma, F. Li, and B. Wang, “U-mamba: Enhancing long-range dependency for biomedical image segmentation,” arXiv:2401.04722, 2024.
- Z. Xing, T. Ye, Y. Yang, G. Liu, and L. Zhu, “Segmamba: Long-range sequential modeling mamba for 3d medical image segmentation,” arXiv:2401.13560, 2024.
- J. Ruan and S. Xiang, “VM-UNet: Vision Mamba UNet for Medical Image Segmentation,” arXiv:2402.02491, 2024.
- Y. Yue and Z. Li, “Medmamba: Vision mamba for medical image classification,” arXiv:2403.03849, 2024.
- V. Kearney, B. P. Ziemer, A. Perry, T. Wang, J. W. Chan, L. Ma, O. Morin, S. S. Yom, and T. D. Solberg, “Attention-aware discrimination for MR-to-CT image translation using cycle-consistent generative adversarial networks,” Radiol. Artif. Intell., vol. 2, no. 2, p. e190027, 2020.
- J. Zhao, D. Li, Z. Kassam, J. Howey, J. Chong, B. Chen, and S. Li, “Tripartite-GAN: Synthesizing liver contrast-enhanced MRI to improve tumor detection,” Med. Image. Anal., vol. 63, p. 101667, 2020.
- Z. Yuan, M. Jiang, Y. Wang, B. Wei, Y. Li, P. Wang, W. Menpes-Smith, Z. Niu, and G. Yang, “SARA-GAN: Self-attention and relative average discriminator based generative adversarial networks for fast compressed sensing MRI reconstruction,” Front. Neuroinform., vol. 14, p. 58, 2020.
- M. Li, W. Hsu, X. Xie, J. Cong, and W. Gao, “SACNN: Self-attention convolutional neural network for low-dose CT denoising with self-supervised perceptual loss network,” IEEE Trans. Med. Imag., vol. 39, no. 7, pp. 2289–2301, 2020.
- Y. Xie, J. Zhang, C. Shen, and Y. Xia, “CoTr: Efficiently bridging CNN and transformer for 3D medical image segmentation,” arXiv:2103.03024, 2021.
- J. Chen, Y. Lu, Q. Yu, X. Luo, E. Adeli, Y. Wang, L. Lu, A. L. Yuille, and Y. Zhou, “TransUNet: Transformers make strong encoders for medical image segmentation,” arXiv:2102.04306, 2021.
- Y. Dai and Y. Gao, “TransMed: Transformers advance multi-modal medical image classification,” arXiv:2103.05940, 2021.
- Y. Luo, Y. Wang, C. Zu, B. Zhan, X. Wu, J. Zhou, D. Shen, and L. Zhou, “3D Transformer-GAN for high-quality PET reconstruction,” in Med. Image Comput. Comput. Assist. Interv., 2021, pp. 276–285.
- Y. Korkmaz, S. U. H. Dar, M. Yurt, M. Ozbey, and T. Cukur, “Unsupervised MRI reconstruction via zero-shot learned adversarial transformers,” IEEE Trans Med Imaging, vol. 41, no. 7, pp. 1747–1763, 2022.
- P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” Comput. Vis. Pattern Recognit., pp. 1125–1134, 2017.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Comput. Vis. Pattern Recognit., 2016, pp. 770–778.
- I. Tolstikhin, N. Houlsby, A. Kolesnikov, L. Beyer, X. Zhai, T. Unterthiner, J. Yung, A. Steiner, D. Keysers, J. Uszkoreit et al., “Mlp-mixer: An all-mlp architecture for vision,” 2021.
- L.-H. Chen, C. G. Bampis, Z. Li, C. Chen, and A. C. Bovik, “Convolutional block design for learned fractional downsampling,” arXiv:2105.09999, 2021.
- T. Nyholm, S. Svensson, S. Andersson, J. Jonsson, M. Sohlin, C. Gustafsson, E. Kjellén, K. Söderström, P. Albertsson, L. Blomqvist et al., “MR and CT data with multiobserver delineations of organs in the pelvic area—part of the gold atlas project,” Med. Phys., vol. 45, no. 3, pp. 1295–1300, 2018.
- M. Jenkinson and S. Smith, “A global optimisation methof for robust affine registration of brain images,” Med. Image. Anal., vol. 5, pp. 143–156, 2001.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Int. Conf. Learn. Represent., 2015.
- A. Sharma and G. Hamarneh, “Missing MRI pulse sequence synthesis using multi-modal generative adversarial network,” IEEE Trans. Med. Imag., vol. 39, pp. 1170–1183, 2020.
- G. Elmas, S. U. Dar, Y. Korkmaz, E. Ceyani, B. Susam, M. Özbey, S. Avestimehr, and T. Çukur, “Federated Learning of Generative Image Priors for MRI Reconstruction,” IEEE Trans Med Imaging, vol. 42, no. 7, pp. 1996–2009, 2023.
- O. Dalmaz, M. U. Mirza, G. Elmas, M. Ozbey, S. U. Dar, E. Ceyani, K. K. Oguz, S. Avestimehr, and T. Çukur, “One model to unite them all: Personalized federated learning of multi-contrast MRI synthesis,” Med Image Anal, vol. 94, p. 103121, 2024.
- J. Wolterink, A. M. Dinkla, M. Savenije, P. Seevinck, C. Berg, and I. Isgum, “Deep MR to CT synthesis using unpaired data,” in Simul. Synth. Med. Imaging, 2017, pp. 14–23.
- C.-B. Jin, H. Kim, M. Liu, W. Jung, S. Joo, E. Park, Y. S. Ahn, I. H. Han, J. I. Lee, and X. Cui, “Deep CT to MR synthesis using paired and unpaired data,” Sensors, vol. 19, no. 10, p. 2361, 2019.
- Y. Ge, D. Wei, Z. Xue, Q. Wang, X. Zhou, Y. Zhan, and S. Liao, “Unpaired MR to CT synthesis with explicit structural constrained adversarial learning,” in Int. Symp. Biomed. Imaging, 2019, pp. 1096–1099.
- B. Zhan, D. Li, Y. Wang, Z. Ma, X. Wu, J. Zhou, and L. Zhou, “LR-cGAN: Latent representation based conditional generative adversarial network for multi-modality MRI synthesis,” Biomed. Signal Process. Control, vol. 66, p. 102457, 2021.
- D. Nie and D. Shen, “Adversarial Confidence Learning for Medical Image Segmentation and Synthesis,” Int. J. Comput. Vision, vol. 128, no. 10, pp. 2494–2513, 2020.
- M. Özbey, S. U. Dar, H. A. Bedel, O. Dalmaz, Ş. Özturk, A. Güngör, and T. Çukur, “Unsupervised medical image translation with adversarial diffusion models,” IEEE Trans Med Imaging, vol. 42, no. 12, pp. 3524–3539, 2023.
- P. Dhariwal and A. Nichol, “Diffusion models beat gans on image synthesis,” in Adv Neural Inf Process Syst, vol. 34, 2021, pp. 8780–8794.
- F. Arslan, B. Kabas, O. Dalmaz, M. Ozbey, and T. Çukur, “Self-consistent recursive diffusion bridge for medical image translation,” arXiv:2405.06789, 2024.
- Z. Wang, Y. Yang, Y. Chen, T. Yuan, M. Sermesant, H. Delingette, and O. Wu, “Mutual information guided diffusion for zero-shot cross-modality medical image translation,” IEEE Trans Med Imaging, pp. 1–1, 2024.
- A. Güngör, S. U. Dar, Ş. Öztürk, Y. Korkmaz, G. Elmas, M. Özbey, and T. Çukur, “Adaptive diffusion priors for accelerated MRI reconstruction,” Med Image Anal, vol. 88, p. 102872, 2023.
- M. U. Mirza, O. Dalmaz, H. A. Bedel, G. Elmas, Y. Korkmaz, A. Gungor, S. U. Dar, and T. Çukur, “Learning Fourier-Constrained Diffusion Bridges for MRI Reconstruction,” arXiv:2308.01096, 2023.
- J. Kim and J. C. Ye, “HiCBridge: Resolution enhancement of hi-c data using direct diffusion bridge,” 2024. [Online]. Available: https://openreview.net/forum?id=RUvzlotXY0
- Y. Korkmaz, T. Cukur, and V. M. Patel, “Self-supervised mri reconstruction with unrolled diffusion models,” in MICCAI, 2023, pp. 491–501.
- H. A. Bedel and T. Çukur, “DreaMR: Diffusion-driven counterfactual explanation for functional MRI,” arXiv:2307.09547, 2023.
- J. Liu, H. Yang, H.-Y. Zhou, Y. Xi, L. Yu, Y. Yu, Y. Liang, G. Shi, S. Zhang, H. Zheng et al., “Swin-umamba: Mamba-based unet with imagenet-based pretraining,” arXiv:2402.03302, 2024.