Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant (2403.04290v1)

Published 7 Mar 2024 in eess.IV, cs.CV, and cs.LG

Abstract: Medical generative models, acknowledged for their high-quality sample generation ability, have accelerated the fast growth of medical applications. However, recent works concentrate on separate medical generation models for distinct medical tasks and are restricted to inadequate medical multi-modal knowledge, constraining medical comprehensive diagnosis. In this paper, we propose MedM2G, a Medical Multi-Modal Generative framework, with the key innovation to align, extract, and generate medical multi-modal within a unified model. Extending beyond single or two medical modalities, we efficiently align medical multi-modal through the central alignment approach in the unified space. Significantly, our framework extracts valuable clinical knowledge by preserving the medical visual invariant of each imaging modal, thereby enhancing specific medical information for multi-modal generation. By conditioning the adaptive cross-guided parameters into the multi-flow diffusion framework, our model promotes flexible interactions among medical multi-modal for generation. MedM2G is the first medical generative model that unifies medical generation tasks of text-to-image, image-to-text, and unified generation of medical modalities (CT, MRI, X-ray). It performs 5 medical generation tasks across 10 datasets, consistently outperforming various state-of-the-art works.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (63)
  1. Deep learning techniques for automatic mri cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE transactions on medical imaging, 37(11):2514–2525, 2018.
  2. Brain tumor MRI and CT scan. https://www.kaggle.com/datasets/chenghanpu/brain-tumor-mri-and-ct-scan, 2022. Accessed: 2022-03-18.
  3. Roentgen: vision-language foundation model for chest x-ray generation. arXiv preprint arXiv:2211.12737, 2022.
  4. Generating radiology reports via memory-driven transformer. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1439–1449, 2020.
  5. Cross-modal memory networks for radiology report generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5904–5914, 2021.
  6. Robust conditional generative adversarial networks. arXiv preprint arXiv:1805.08657, 2018.
  7. Torchxrayvision: A library of chest x-ray datasets and models. In International Conference on Medical Imaging with Deep Learning, pages 231–249. PMLR, 2022.
  8. Mednerf: Medical neural radiance fields for reconstructing 3d-aware ct-projections from a single x-ray. In 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pages 3843–3848. IEEE, 2022.
  9. Preparing a collection of radiology examinations for distribution and retrieval. Journal of the American Medical Informatics Association, pages 304–310, 2016.
  10. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  11. An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618, 2022.
  12. X-ctrsnet: 3d cervical vertebra ct reconstruction and segmentation directly from 2d x-ray images. Knowledge-Based Systems, 236:107680, 2022.
  13. Imagebind: One embedding space to bind them all. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15180–15190, 2023.
  14. Adaptive diffusion priors for accelerated mri reconstruction. Medical Image Analysis, page 102872, 2023.
  15. Attgan: Facial attribute editing by only changing what you want. IEEE transactions on image processing, 28(11):5464–5478, 2019.
  16. Comparison and evaluation of methods for liver segmentation from ct datasets. IEEE transactions on medical imaging, 28(8):1251–1265, 2009.
  17. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  18. Cascaded diffusion models for high fidelity image generation. The Journal of Machine Learning Research, 23(1):2249–2281, 2022.
  19. Multimodal unsupervised image-to-image translation. CoRR, abs/1804.04732, 2018.
  20. Kiut: Knowledge-injected u-transformer for radiology report generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19809–19818, 2023.
  21. IXI dataset. https://brain-development.org/ixi-dataset/, 2023. Accessed: 2023-02-14.
  22. Cola-diff: Conditional latent diffusion model for multi-modal mri synthesis. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 398–408. Springer, 2023.
  23. Mimic-cxr-jpg, a large publicly available database of labeled chest radiographs, 2019.
  24. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8110–8119, 2020.
  25. Alias-free generative adversarial networks. Advances in Neural Information Processing Systems, 34:852–863, 2021.
  26. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  27. Unified chest x-ray and radiology report generation model with multi-view chest x-rays. arXiv preprint arXiv:2302.12172, 2023.
  28. Optimus: Organizing sentences via pre-trained modeling of a latent space, 2020.
  29. Gligen: Open-set grounded text-to-image generation. CVPR, 2023.
  30. Exploring and distilling posterior and prior knowledge for radiology report generation. In CVPR, pages 13753–13762, 2021.
  31. Unsupervised image-to-image translation networks. CoRR, abs/1703.00848, 2017.
  32. Conversion between ct and mri images using diffusion and score-matching models. arXiv preprint arXiv:2209.12104, 2022.
  33. 2d-to-3d: A review for computational 3d image reconstruction from x-ray images. Archives of Computational Methods in Engineering, 30(1):85–114, 2023.
  34. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pages 8162–8171. PMLR, 2021.
  35. Mr and ct data with multiobserver delineations of organs in the pelvic area—part of the gold atlas project. Medical physics, 45(3):1295–1300, 2018.
  36. Unsupervised medical image translation with adversarial diffusion models. IEEE Transactions on Medical Imaging, 2023.
  37. Learning to generate semantic layouts for higher text-image correspondence in text-to-image synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7591–7600, 2023.
  38. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  39. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  40. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2):3, 2022.
  41. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
  42. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22500–22510, 2023.
  43. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
  44. Noise estimation for generative diffusion models. arXiv preprint arXiv:2104.02600, 2021.
  45. Unit-ddpm: Unpaired image translation with denoising diffusion probabilistic models. arXiv preprint arXiv:2104.05358, 2021.
  46. Evaluating the clinical realism of synthetic chest x-rays generated using progressively growing gans. SN Computer Science, 2(4):321, 2021.
  47. Missing mri pulse sequence synthesis using multi-modal generative adversarial network. IEEE transactions on medical imaging, 39(4):1170–1183, 2019.
  48. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pages 2256–2265. PMLR, 2015.
  49. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020a.
  50. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020b.
  51. Medicat: A dataset of medical images, captions, and textual references. CoRR, abs/2010.06000, 2020.
  52. Any-to-any generation via composable diffusion. arXiv preprint arXiv:2305.11846, 2023.
  53. Complex organ mask guided radiology report generation. arXiv preprint arXiv:2311.02329, 2023.
  54. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2097–2106, 2017.
  55. Metransformer: Radiology report generation by transformer with multiple learnable expert tokens. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11558–11567, 2023.
  56. Versatile diffusion: Text, images and variations all in one diffusion model. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7754–7765, 2023.
  57. Radbert: Adapting transformer-based language models to radiology. Radiology: Artificial Intelligence, 4(4):e210258, 2022.
  58. Clinical-bert: Vision-language pre-training for radiograph diagnosis and reports generation. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 2982–2990, 2022.
  59. Aligntransformer: Hierarchical alignment of visual regions and disease tags for medical report generation. In MICCAI, 2021.
  60. Progressively volumetrized deep generative models for data-efficient contextual learning of mr image recovery. Medical Image Analysis, 78:102429, 2022.
  61. Barlow twins: Self-supervised learning via redundancy reduction. In International Conference on Machine Learning, pages 12310–12320. PMLR, 2021.
  62. Self-attention generative adversarial networks. In International conference on machine learning, pages 7354–7363. PMLR, 2019.
  63. Hi-net: hybrid-fusion network for multi-modal mr image synthesis. IEEE transactions on medical imaging, 39(9):2772–2781, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Chenlu Zhan (9 papers)
  2. Yu Lin (50 papers)
  3. Gaoang Wang (68 papers)
  4. Hongwei Wang (150 papers)
  5. Jian Wu (314 papers)
Citations (6)