Addressing Asynchronicity in Clinical Multimodal Fusion via Individualized Chest X-ray Generation (2410.17918v1)
Abstract: Integrating multi-modal clinical data, such as electronic health records (EHR) and chest X-ray images (CXR), is particularly beneficial for clinical prediction tasks. However, in a temporal setting, multi-modal data are often inherently asynchronous. EHR can be continuously collected but CXR is generally taken with a much longer interval due to its high cost and radiation dose. When clinical prediction is needed, the last available CXR image might have been outdated, leading to suboptimal predictions. To address this challenge, we propose DDL-CXR, a method that dynamically generates an up-to-date latent representation of the individualized CXR images. Our approach leverages latent diffusion models for patient-specific generation strategically conditioned on a previous CXR image and EHR time series, providing information regarding anatomical structures and disease progressions, respectively. In this way, the interaction across modalities could be better captured by the latent CXR generation process, ultimately improving the prediction performance. Experiments using MIMIC datasets show that the proposed model could effectively address asynchronicity in multimodal fusion and consistently outperform existing methods.
- Diagnostic value of imaging modalities for COVID-19: scoping review. Journal of Medical Internet Research, 22(8):e19673, 2020.
- Artificial intelligence-based methods for fusion of electronic health records and imaging data. Scientific Reports, 12(1):17981, 2022.
- PATNet: Propensity-adjusted temporal network for joint imputation and prediction using binary EHRs with observation bias. IEEE Transactions on Knowledge & Data Engineering, 36(06):2600–2613, 2024.
- Learning inter-modal correspondence and phenotypes from multi-modal electronic health records. IEEE Transactions on Knowledge & Data Engineering, 34(09):4328–4341, 2022.
- LogPar: Logistic PARAFAC2 factorization for temporal binary data with missing values. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1625–1635, 2020.
- Medical concept embedding with multiple ontological representations. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, pages 4613–4619, 2019.
- Fusion of medical imaging and electronic health records using deep learning: a systematic review and implementation guidelines. NPJ Digital Medicine, 3(1):136, 2020a.
- Multimodal fusion with deep neural networks for leveraging CT imaging and electronic health record: a case-study in pulmonary embolism detection. Scientific Reports, 10(1):22147, 2020b.
- Combining 3D image and tabular data via the dynamic affine feature map transform. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part V 24, pages 688–698. Springer, 2021.
- MedFuse: Multi-modal fusion with clinical time-series data and chest X-ray images. In Proceedings of the 7th Machine Learning for Healthcare Conference, volume 182 of Proceedings of Machine Learning Research, pages 479–503. PMLR, 05–06 Aug 2022.
- Multimodal deep learning for biomedical data fusion: a review. Briefings in Bioinformatics, 23(2):bbab569, 2022.
- Learning missing modal electronic health records with unified multi-modal data embedding and modality-aware attention. In Proceedings of the 8th Machine Learning for Healthcare Conference, 2023.
- DrFuse: Learning disentangled representation for clinical multi-modal fusion with missing modality and modal inconsistency. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 16416–16424, 2024.
- Electronic health record adoption in us hospitals: progress continues, but challenges persist. Health Affairs, 34(12):2174–2180, 2015.
- Advantage of vital sign monitoring using a wireless wearable device for predicting septic shock in febrile patients in the emergency department: A machine learning-based analysis. Sensors, 22(18):7054, 2022.
- Accuracy and efficacy of chest radiography in the intensive care unit. Radiologic Clinics of North America, 34(1):21–31, 1996.
- Improving medical predictions by irregular multimodal electronic health records modeling. In International Conference on Machine Learning, pages 41300–41313. PMLR, 2023.
- MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Scientific Data, 6(1):317, 2019.
- Routine chest X-rays in intensive care units: a systematic review and meta-analysis. Critical Care, 16(2):1–12, 2012.
- Deep learning classification of cardiomegaly using combined imaging and non-imaging ICU data. In Medical Image Understanding and Analysis: 25th Annual Conference, MIUA 2021, Oxford, United Kingdom, July 12–14, 2021, Proceedings 25, pages 547–558. Springer, 2021.
- AudioLDM: Text-to-audio generation with latent diffusion models. In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 21450–21474. PMLR, 23–29 Jul 2023.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
- Attention is all you need. Advances in Neural Information Processing Systems, 30, 2017a.
- ManyDG: Many-domain generalization for healthcare applications. In The Eleventh International Conference on Learning Representations, 2023a.
- Prognostication of patients with COVID-19 using artificial intelligence based on chest X-rays and clinical data: a retrospective study. The Lancet Digital Health, 3(5):e286–e294, 2021.
- Multimodal diagnosis for pulmonary embolism from EHR data and CT images. In 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pages 2053–2057. IEEE, 2022.
- MDF-Net for abnormality detection by fusing X-rays with clinical data. Scientific Reports, 13(1):15873, 2023.
- Deep multi-modal fusion of image and non-image data in disease diagnosis and prognosis: a review. Progress in Biomedical Engineering, 2023.
- mmFormer: Multimodal medical Tansformer for incomplete multimodal learning of brain tumor segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 107–117. Springer, 2022a.
- M3Care: Learning with missing modalities in multimodal healthcare data. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2418–2428, 2022b.
- Learning to exploit temporal structure for biomedical vision-language processing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15016–15027, June 2023.
- Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
- Denoising diffusion implicit models. In International Conference on Learning Representations, 2021.
- Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
- Diffsound: Discrete diffusion model for text-to-sound generation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023b.
- MM-Diffusion: Learning multi-modal diffusion models for joint audio and video generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10219–10228, 2023.
- TabDDPM: Modelling tabular data with diffusion models. In International Conference on Machine Learning, pages 17564–17579. PMLR, 2023.
- Hierarchical text-conditional image generation with CLIP latents, 2022. URL https://arxiv. org/abs/2204.06125, 7, 2022.
- Brain imaging generation with latent diffusion models. In MICCAI Workshop on Deep Generative Models, pages 117–126. Springer, 2022.
- Generating realistic brain MRIs via a conditional diffusion probabilistic model. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 14–24. Springer, 2023.
- Generation of anonymous chest radiographs using latent diffusion models for training thoracic abnormality classification systems. In 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI), pages 1–5. IEEE, 2023.
- Cascaded latent diffusion models for high-resolution chest X-ray synthesis. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 180–191. Springer, 2023.
- Biomedjourney: Counterfactual biomedical image generation by instruction-learning from multimodal patient journeys. arXiv preprint arXiv:2310.10765, 2023.
- CheXpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 590–597, 2019.
- Auto-encoding variational Bayes. In International Conference on Learning Representations, 2014.
- The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
- U-Net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
- A Tansformer-based framework for multivariate time series representation learning. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 2114–2124, 2021.
- Attention is all you need. Advances in Neural Information Processing Systems, 30, 2017b.
- MIMIC-IV, a freely accessible electronic health record dataset. Scientific Data, 10(1):1, 2023.
- Multitask learning and benchmarking with clinical time series data. Scientific Data, 6(1):96, 2019.
- MMTM: Multimodal transfer module for CNN fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13289–13299, 2020.
- Learning to synthesise the ageing brain without longitudinal data. Medical Image Analysis, 73:102169, 2021.
- GANs trained by a two time-scale update rule converge to a local Nash equilibrium. Advances in Neural Information Processing Systems, 30, 2017.
- Time series prediction using deep learning methods in healthcare. ACM Transactions on Management Information Systems, 14(1):1–29, 2023.
Collections
Sign up for free to add this paper to one or more collections.