Enhancing Global Sensitivity and Uncertainty Quantification in Medical Image Reconstruction with Monte Carlo Arbitrary-Masked Mamba (2405.17659v2)
Abstract: Deep learning has been extensively applied in medical image reconstruction, where Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) represent the predominant paradigms, each possessing distinct advantages and inherent limitations: CNNs exhibit linear complexity with local sensitivity, whereas ViTs demonstrate quadratic complexity with global sensitivity. The emerging Mamba has shown superiority in learning visual representation, which combines the advantages of linear scalability and global sensitivity. In this study, we introduce MambaMIR, an Arbitrary-Masked Mamba-based model with wavelet decomposition for joint medical image reconstruction and uncertainty estimation. A novel Arbitrary Scan Masking (ASM) mechanism "masks out" redundant information to introduce randomness for further uncertainty estimation. Compared to the commonly used Monte Carlo (MC) dropout, our proposed MC-ASM provides an uncertainty map without the need for hyperparameter tuning and mitigates the performance drop typically observed when applying dropout to low-level tasks. For further texture preservation and better perceptual quality, we employ the wavelet transformation into MambaMIR and explore its variant based on the Generative Adversarial Network, namely MambaMIR-GAN. Comprehensive experiments have been conducted for multiple representative medical image reconstruction tasks, demonstrating that the proposed MambaMIR and MambaMIR-GAN outperform other baseline and state-of-the-art methods in different reconstruction tasks, where MambaMIR achieves the best reconstruction fidelity and MambaMIR-GAN has the best perceptual quality. In addition, our MC-ASM provides uncertainty maps as an additional tool for clinicians, while mitigating the typical performance drop caused by the commonly used dropout.
- The perception-distortion tradeoff, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6228–6237.
- Low-dose CT with a residual encoder-decoder convolutional neural network. IEEE Transactions on Medical Imaging 36, 2524–2535.
- TransUNet: Transformers make strong encoders for medical image segmentation. arXiv e-prints , arXiv:2102.04306arXiv:2102.04306.
- MiM-ISTD: Mamba-in-Mamba for efficient infrared small target detection. arXiv e-prints , arXiv:2403.02148arXiv:2403.02148.
- SKM-TEA: A dataset for accelerated MRI reconstruction with dense image labels for quantitative clinical evaluation. arXiv e-prints , arXiv:2203.06823arXiv:2203.06823.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv e-prints , arXiv:2010.11929arXiv:2010.11929.
- Dropout as a bayesian approximation: Representing model uncertainty in deep learning, in: Proceedings of The 33rd International Conference on Machine Learning, PMLR, New York, New York, USA. pp. 1050--1059.
- nnMamba: 3D biomedical image segmentation, classification and landmark detection with state space model. arXiv e-prints , arXiv:2402.03526arXiv:2402.03526.
- PET image reconstruction using deep image prior. IEEE transactions on medical imaging 38, 1655--1665.
- MAPEM-Net: an unrolled neural network for fully 3D PET image reconstruction, in: 15th International meeting on fully three-dimensional image reconstruction in radiology and nuclear medicine, SPIE. pp. 109--113.
- Uncertainty autoencoders: Learning compressed representations via variational information maximization, in: The 22nd international conference on artificial intelligence and statistics, PMLR. pp. 2514--2524.
- Mamba: Linear-time sequence modeling with selective state spaces. arXiv e-prints , arXiv:2312.00752arXiv:2312.00752.
- Efficiently modeling long sequences with structured state spaces. arXiv e-prints , arXiv:2111.00396arXiv:2111.00396.
- Revisiting l1-wavelet compressed-sensing MRI in the era of deep learning. Proceedings of the National Academy of Sciences 119, e2201062119.
- MambaIR: A simple baseline for image restoration with state-space model. arXiv e-prints , arXiv:2402.15648arXiv:2402.15648.
- Physics-driven deep learning for computational magnetic resonance imaging: Combining physics and machine learning for improved medical imaging. IEEE Signal Processing Magazine 40, 98--114.
- Radon inversion via deep learning. IEEE Transactions on Medical Imaging 39, 2076--2087.
- Pan-Mamba: Effective pan-sharpening with state space model. arXiv e-prints , arXiv:2402.12192arXiv:2402.12192.
- Hybrid-domain neural network processing for sparse-view CT reconstruction. IEEE Transactions on Radiation and Plasma Medical Sciences 5, 88--98.
- Swin transformer for fast MRI. Neurocomputing 493, 281--304.
- Data and physics driven deep learning models for fast MRI reconstruction: Fundamentals and methodologies. arXiv e-prints , arXiv:2401.16564arXiv:2401.16564.
- Fast MRI reconstruction: How powerful transformers are?, in: 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pp. 2066--2070.
- Swin deformable attention U-Net transformer (SDAUT) for explainable fast MRI, in: Medical Image Computing and Computer Assisted Intervention -- MICCAI 2022, Springer Nature Switzerland, Cham. pp. 538--548.
- LocalMamba: Visual state space model with windowed selective scan. arXiv e-prints , arXiv:2403.09338arXiv:2403.09338.
- Deep learning for undersampled MRI reconstruction. Physics in Medicine & Biology 63, 135007.
- Deep convolutional neural network for inverse problems in imaging. IEEE Transactions on Image Processing 26, 4509--4522.
- What uncertainties do we need in bayesian deep learning for computer vision? Advances in neural information processing systems 30.
- Advances in PET: The Latest in Instrumentation, Technology, and Clinical Practice. Springer.
- Reflash dropout in image super-resolution, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6002--6012.
- Fast and accurate image Super-Resolution with deep laplacian pyramid networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 2599--2613.
- Trustworthy clinical AI solutions: a unified review of uncertainty quantification in deep learning models for medical image analysis. Artificial Intelligence in Medicine , 102830.
- View-interpolation of sparsely sampled sinogram using convolutional neural network, in: Medical Imaging 2017: Image Processing, International Society for Optics and Photonics. SPIE. pp. 617 -- 624.
- VideoMamba: State space model for efficient video understanding. arXiv e-prints , arXiv:2403.06977arXiv:2403.06977.
- Mamba-ND: Selective state space modeling for multi-dimensional data. arXiv e-prints , arXiv:2402.05892arXiv:2402.05892.
- Wavelet-based texture reformation network for image super-resolution. IEEE Transactions on Image Processing 31, 2647--2660.
- Deep magnetic resonance image reconstruction: Inverse problems meet neural networks. IEEE Signal Processing Magazine 37, 141--151.
- SwinIR: Image restoration using swin transformer, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, pp. 1833--1844.
- DARTS: Differentiable architecture search. arXiv e-prints , arXiv:1806.09055arXiv:1806.09055.
- Multi-level wavelet convolutional neural networks. IEEE Access 7, 74973--74985.
- VMamba: Visual state space model. arXiv e-prints , arXiv:2401.10166arXiv:2401.10166.
- Swin transformer: Hierarchical vision transformer using shifted windows, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 10012--10022.
- Bayesian MRI reconstruction with joint uncertainty estimation using diffusion models. Magnetic Resonance in Medicine 90, 295--311.
- Understanding the effective receptive field in deep convolutional neural networks, in: Advances in Neural Information Processing Systems, Curran Associates, Inc.
- U-Mamba: Enhancing long-range dependency for biomedical image segmentation. arXiv e-prints , arXiv:2401.04722arXiv:2401.04722.
- Low-dose CT image and projection dataset. Medical Physics 48, 902--911.
- Why do commercial CT scanners still employ traditional, filtered back-projection for image reconstruction? Inverse problems 25, 123009.
- Towards performant and reliable undersampled MR reconstruction via diffusion model sampling, in: Medical Image Computing and Computer Assisted Intervention -- MICCAI 2022, Springer Nature Switzerland, Cham. pp. 623--633.
- Wavelet diffusion models are fast and scalable image generators, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10199--10208.
- Searching for activation functions. arXiv e-prints , arXiv:1710.05941arXiv:1710.05941.
- U-Net: Convolutional networks for biomedical image segmentation, in: Medical Image Computing and Computer-Assisted Intervention -- MICCAI 2015, Springer International Publishing, Cham. pp. 234--241.
- VM-UNet: Vision Mamba UNet for medical image segmentation. arXiv e-prints , arXiv:2402.02491arXiv:2402.02491.
- A deep cascade of convolutional neural networks for MR image reconstruction, in: Information Processing in Medical Imaging, Springer International Publishing, Cham. pp. 647--658.
- A U-Net based discriminator for generative adversarial networks, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8207--8216.
- ALARA: Is there a cause for alarm? reducing radiation risks from computed tomography scanning in children. Current opinion in pediatrics 20, 243--247.
- Very deep convolutional networks for large-scale image recognition. arXiv e-prints , arXiv:1409.1556arXiv:1409.1556.
- Deep ADMM-Net for compressive sensing MRI. Advances in neural information processing systems 29.
- Deep learning for tomographic image reconstruction. Nature Machine Intelligence 2, 737--748. Number: 12 Publisher: Nature Publishing Group.
- Large window-based Mamba UNet for medical image segmentation: Beyond convolution and self-attention. arXiv e-prints , arXiv:2403.07332arXiv:2403.07332.
- DISGAN: Wavelet-informed discriminator guides GAN to MRI super-resolution with noise cleaning, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2452--2461.
- Wavelet-improved score-based generative model for medical imaging. IEEE transactions on medical imaging .
- MAGIC: Manifold and graph integrative convolutional network for low-dose CT reconstruction. IEEE transactions on medical imaging 40, 3459--3472.
- RegFormer: A local-nonlocal regularization-based model for sparse-view CT reconstruction. IEEE Transactions on Radiation and Plasma Medical Sciences .
- FISTA-Net: Learning a fast iterative shrinkage thresholding network for inverse problems in imaging. IEEE Transactions on Medical Imaging 40, 1329--1339.
- DAGAN: Deep de-aliasing generative adversarial networks for fast compressed sensing MRI reconstruction. IEEE Transactions on Medical Imaging 37, 1310--1321.
- Y-net: Multi-scale feature aggregation network with wavelet structure similarity loss function for single image dehazing, in: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE. pp. 2628--2632.
- Learning projection views for sparse-view CT reconstruction, in: Proceedings of the 30th ACM International Conference on Multimedia, pp. 2645--2653.
- Low-dose CT denoising via sinogram inner-structure transformer. IEEE Transactions on Medical Imaging 42, 910--921.
- ADMM-CSNet: A deep learning approach for image compressive sensing. IEEE transactions on pattern analysis and machine intelligence 42, 521--538.
- Wavefill: A wavelet-based generation network for image inpainting, in: Proceedings of the IEEE/CVF international conference on computer vision, pp. 14114--14123.
- fastMRI: An open dataset and benchmarks for accelerated MRI. arXiv e-prints , arXiv:1811.08839arXiv:1811.08839.
- The unreasonable effectiveness of deep features as a perceptual metric, in: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 586--595.
- A sparse-view CT reconstruction method based on combination of densenet and deconvolution. IEEE Transactions on Medical Imaging 37, 1407--1417.
- Generative models for inverse imaging problems: From mathematical foundations to physics-driven applications. IEEE Signal Processing Magazine 40, 148--163.
- FD-Vision Mamba for endoscopic exposure correction. arXiv e-prints , arXiv:2402.06378arXiv:2402.06378.
- Vision Mamba: Efficient visual representation learning with bidirectional state space model. arXiv e-prints , arXiv:2401.09417arXiv:2401.09417.
- A review of uncertainty estimation and its application in medical imaging. Meta-Radiology , 100003.