Temporal and Spatial Super Resolution with Latent Diffusion Model in Medical MRI images (2410.23898v1)
Abstract: Super Resolution (SR) plays a critical role in computer vision, particularly in medical imaging, where hardware and acquisition time constraints often result in low spatial and temporal resolution. While diffusion models have been applied for both spatial and temporal SR, few studies have explored their use for joint spatial and temporal SR, particularly in medical imaging. In this work, we address this gap by proposing to use a Latent Diffusion Model (LDM) combined with a Vector Quantised GAN (VQGAN)-based encoder-decoder architecture for joint super resolution. We frame SR as an image denoising problem, focusing on improving both spatial and temporal resolution in medical images. Using the cardiac MRI dataset from the Data Science Bowl Cardiac Challenge, consisting of 2D cine images with a spatial resolution of 256x256 and 8-14 slices per time-step, we demonstrate the effectiveness of our approach. Our LDM model achieves Peak Signal to Noise Ratio (PSNR) of 30.37, Structural Similarity Index (SSIM) of 0.7580, and Learned Perceptual Image Patch Similarity (LPIPS) of 0.2756, outperforming simple baseline method by 5% in PSNR, 6.5% in SSIM, 39% in LPIPS. Our LDM model generates images with high fidelity and perceptual quality with 15 diffusion steps. These results suggest that LDMs hold promise for advancing super resolution in medical imaging, potentially enhancing diagnostic accuracy and patient outcomes. Code link is also shared.
- Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9232–9241.
- Ldmvfi: Video frame interpolation with latent diffusion models. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 1472–1480.
- Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12873–12883.
- Gunnar Farnebäck. 2003. Two-frame motion estimation based on polynomial expansion. In Image Analysis: 13th Scandinavian Conference, SCIA 2003 Halmstad, Sweden, June 29–July 2, 2003 Proceedings 13. Springer, 363–370.
- Global Spatial-Temporal Information-based Residual ConvLSTM for Video Space-Time Super-Resolution. arXiv preprint arXiv:2407.08466.
- Implicit diffusion models for continuous super-resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10021–10030.
- Motion-aware Latent Diffusion Models for Video Frame Interpolation. arXiv preprint arXiv:2404.13534.
- Video interpolation with diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7341–7351.
- kaggle.com. 2020. Data Science Bowl Cardiac Challenge Data. Retrieved August 1, 2024 from https://www.kaggle.com/c/second-annual-data-science-bowl
- Jinseok Kim and Tae-Kyun Kim. 2024. Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9202–9211.
- Data-Efficient Unsupervised Interpolation Without Any Intermediate Frame for 4D Medical Images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11353–11364.
- Disentangled Motion Modeling for Video Frame Interpolation. arXiv preprint arXiv:2406.17256.
- Rethinking diffusion model for multi-contrast mri super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11365–11374.
- Latent consistency models: Synthesizing high-resolution images with few-step inference. arXiv preprint arXiv:2310.04378.
- Diffusion Models, Image Super-Resolution And Everything: A Survey. arXiv 2024. arXiv preprint arXiv:2401.00736.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10684–10695.
- Exploiting diffusion prior for real-world image super-resolution. International Journal of Computer Vision, 1–21.
- Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data. arXiv:2107.10833 [eess.IV] https://arxiv.org/abs/2107.10833
- SinSR: diffusion-based image super-resolution in a single step. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 25796–25805.
- Simda: Simple diffusion adapter for efficient video generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7827–7839.
- Resshift: Efficient diffusion model for image super-resolution by residual shifting. Advances in Neural Information Processing Systems 36 (2024).
- Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2535–2545.
- Tong Zhou and Shuqiang Wang. 2024. Spatio-Temporal Adaptive Diffusion Models for EEG Super-Resolution in Epilepsy Diagnosis. arXiv preprint arXiv:2407.03089.