Papers
Topics
Authors
Recent
Search
2000 character limit reached

AliFuse: Aligning and Fusing Multi-modal Medical Data for Computer-Aided Diagnosis

Published 2 Jan 2024 in cs.CV | (2401.01074v3)

Abstract: Medical data collected for diagnostic decisions are typically multimodal, providing comprehensive information on a subject. While computer-aided diagnosis systems can benefit from multimodal inputs, effectively fusing such data remains a challenging task and a key focus in medical research. In this paper, we propose a transformer-based framework, called Alifuse, for aligning and fusing multimodal medical data. Specifically, we convert medical images and both unstructured and structured clinical records into vision and language tokens, employing intramodal and intermodal attention mechanisms to learn unified representations of all imaging and non-imaging data for classification. Additionally, we integrate restoration modeling with contrastive learning frameworks, jointly learning the high-level semantic alignment between images and texts and the low-level understanding of one modality with the help of another. We apply Alifuse to classify Alzheimer's disease, achieving state-of-the-art performance on five public datasets and outperforming eight baselines.

Authors (2)
Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. “Computer-aided diagnosis in the era of deep learning,” Medical physics, vol. 47, no. 5, pp. e218–e227, 2020.
  2. “Mining gaze for contrastive learning toward computer-assisted diagnosis,” 2023.
  3. “Pmc-vqa: Visual instruction tuning for medical visual question answering,” arXiv:2305.10415, 2023.
  4. “Heterogeneous graph learning for multi-modal medical data analysis,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2023, vol. 37, pp. 5141–5150.
  5. “A transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics,” Nature Biomedical Engineering, pp. 1–13, 2023.
  6. “Development of a multimodal machine-learning fusion model to non-invasively assess ileal crohn’s disease endoscopic activity,” Computer Methods and Programs in Biomedicine, vol. 227, pp. 107207, 2022.
  7. “Learning transferable visual models from natural language supervision,” in International conference on machine learning. PMLR, 2021, pp. 8748–8763.
  8. “Align before fuse: Vision and language representation learning with momentum distillation,” Advances in neural information processing systems, vol. 34, pp. 9694–9705, 2021.
  9. “Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation,” in International Conference on Machine Learning. PMLR, 2022, pp. 12888–12900.
  10. “Vlmo: Unified vision-language pre-training with mixture-of-modality-experts,” Advances in Neural Information Processing Systems, vol. 35, pp. 32897–32912, 2022.
  11. “A simple baseline for zero-shot semantic segmentation with pre-trained vision-language model,” arXiv:2112.14757, 2021.
  12. “Clinical-bert: Vision-language pre-training for radiograph diagnosis and reports generation,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2022, vol. 36, pp. 2982–2990.
  13. “Medclip: Contrastive learning from unpaired medical images and text,” arXiv:2210.10163, 2022.
  14. “Alzheimer’s disease neuroimaging initiative (adni): clinical characterization,” Neurology, vol. 74, no. 3, pp. 201–209, 2010.
  15. “The national alzheimer’s coordinating center (nacc) database: the uniform data set,” Alzheimer Disease & Associated Disorders, vol. 21, no. 3, pp. 249–258, 2007.
  16. “Open access series of imaging studies (oasis): cross-sectional mri data in young, middle aged, nondemented, and demented older adults,” Journal of cognitive neuroscience, vol. 19, no. 9, pp. 1498–1507, 2007.
  17. “The australian imaging, biomarkers and lifestyle (aibl) study of aging: methodology and baseline characteristics of 1112 individuals recruited for a longitudinal study of alzheimer’s disease,” International psychogeriatrics, vol. 21, no. 4, pp. 672–687, 2009.
  18. “Miriad—public release of a multiple time point alzheimer’s mr imaging dataset,” NeuroImage, vol. 70, pp. 33–36, 2013.
  19. “Scaling up visual and vision-language representation learning with noisy text supervision,” in International Conference on Machine Learning. PMLR, 2021, pp. 4904–4916.
  20. “Coca: Contrastive captioners are image-text foundation models,” arXiv:2205.01917, 2022.
  21. “Expert-level detection of pathologies from unannotated chest x-ray images via self-supervised learning,” Nature Biomedical Engineering, pp. 1–8, 2022.
  22. “Multi-modal understanding and generation for medical images and text via vision-language pre-training,” IEEE Journal of Biomedical and Health Informatics, vol. 26, no. 12, pp. 6070–6080, 2022.
  23. “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv:2010.11929, 2020.
  24. “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv:1810.04805, 2018.
  25. “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  26. “Med3d: Transfer learning for 3d medical image analysis,” arXiv:1904.00625, 2019.
  27. “M3t: three-dimensional medical image classifier using multi-plane and multi-slice transformer,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 20718–20729.
  28. “Git: A generative image-to-text transformer for vision and language,” arXiv:2205.14100, 2022.
  29. “Perceiver: General perception with iterative attention,” in International conference on machine learning. PMLR, 2021, pp. 4651–4664.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.