MMGPL: Multimodal Medical Data Analysis with Graph Prompt Learning (2312.14574v2)
Abstract: Prompt learning has demonstrated impressive efficacy in the fine-tuning of multimodal large models to a wide range of downstream tasks. Nonetheless, applying existing prompt learning methods for the diagnosis of neurological disorder still suffers from two issues: (i) existing methods typically treat all patches equally, despite the fact that only a small number of patches in neuroimaging are relevant to the disease, and (ii) they ignore the structural information inherent in the brain connection network which is crucial for understanding and diagnosing neurological disorders. To tackle these issues, we introduce a novel prompt learning model by learning graph prompts during the fine-tuning process of multimodal large models for diagnosing neurological disorders. Specifically, we first leverage GPT-4 to obtain relevant disease concepts and compute semantic similarity between these concepts and all patches. Secondly, we reduce the weight of irrelevant patches according to the semantic similarity between each patch and disease-related concepts. Moreover, we construct a graph among tokens based on these concepts and employ a graph convolutional network layer to extract the structural information of the graph, which is used to prompt the pre-trained multimodal large models for diagnosing neurological disorders. Extensive experiments demonstrate that our method achieves superior performance for neurological disorder diagnosis compared with state-of-the-art methods and validated by clinicians.
- Autism and abnormal development of brain connectivity. Journal of Neuroscience 24, 9228–9231.
- Graph neural networks in network neuroscience. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 5833–5848.
- Language models are few-shot learners. Advances in neural information processing systems 33, 1877–1901.
- Complex brain networks: graph theoretical analysis of structural and functional systems. Nature reviews neuroscience 10, 186–198.
- Graph transformer geometric learning of brain networks using multimodal mr images for brain age estimation. IEEE Transactions on Medical Imaging 42, 456–466.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 .
- The autism brain imaging data exchange: towards a large-scale evaluation of the intrinsic brain architecture in autism. Molecular psychiatry 19, 659–667.
- Challenges for machine learning in clinical translation of big data imaging studies. Neuron .
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 .
- Palm-e: An embodied multimodal language model arXiv:arXiv:2303.03378.
- Jointly discriminative and generative recurrent neural networks for learning from fmri, in: MLMI, Springer. pp. 382--390.
- The global burden of neurological disorders: translating evidence into policy. The Lancet Neurology 19, 255--265.
- The connectomics of brain disorders. Nature Reviews Neuroscience 16, 159--172.
- Making pre-trained language models better few-shot learners. arXiv preprint arXiv:2012.15723 .
- Imagebind: One embedding space to bind them all, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15180--15190.
- Enhancement of mr images using registration for signal averaging. Journal of computer assisted tomography 22, 324--333.
- Towards multi-modal causability with graph neural networks enabling information fusion for explainable ai. Information Fusion 71, 28--37.
- Universal language model fine-tuning for text classification, in: Gurevych, I., Miyao, Y. (Eds.), Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 328--339. doi:10.18653/v1/P18-1031.
- Segment anything model for medical images? arXiv preprint arXiv:2304.14660 .
- The alzheimer’s disease neuroimaging initiative (adni): Mri methods. Journal of Magnetic Resonance Imaging: An Official Journal of the International Society for Magnetic Resonance in Medicine 27, 685--691.
- Sparse is enough in scaling transformers. Advances in Neural Information Processing Systems 34, 9895--9907.
- Improved optimization for the robust and accurate linear registration and motion correction of brain images. Neuroimage 17, 825--841.
- Fsl. Neuroimage 62, 782--790.
- Visual prompt tuning, in: European Conference on Computer Vision, Springer. pp. 709--727.
- Brain network transformer. Advances in Neural Information Processing Systems 35, 25586--25599.
- Inceptiongcn: receptive field aware graph convolutional network for disease prediction, in: IPMI, Springer. pp. 73--85.
- Maple: Multi-modal prompt learning, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19113--19122.
- Semi-supervised classification with graph convolutional networks, pp. 1--14.
- Segment anything. arXiv preprint arXiv:2304.02643 .
- Concept bottleneck models, in: International conference on machine learning, PMLR. pp. 5338--5348.
- Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks 3361, 1995.
- The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691 .
- Llava-med: Training a large language-and-vision assistant for biomedicine in one day. arXiv preprint arXiv:2306.00890 .
- Braingnn: Interpretable brain graph neural network for fmri analysis. Medical Image Analysis 74, 102233.
- Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190 .
- P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks, in: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, pp. 61--68.
- Autism spectrum disorder. The lancet 392, 508--520.
- Prompt distribution learning, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5206--5215.
- Segment anything in medical images. arXiv preprint arXiv:2304.12306 .
- Video-chatgpt: Towards detailed video understanding via large vision and language models. arXiv preprint arXiv:2306.05424 .
- Gpt-4 technical report. arxiv 2303.08774. View in Article 2, 13.
- Hippocampal neuronal loss in the ca1 and ca3 areas of alzheimer’s disease patients. Psychiatria Danubina 24, 152--158.
- Disease prediction using graph convolutional networks: application to autism spectrum disorder and alzheimer’s disease. Medical Image Analysis 48, 117--130.
- Brain connectivity in neurodegenerative diseases—from phenotype to proteinopathy. Nature Reviews Neurology 10, 620--633.
- Learning transferable visual models from natural language supervision, in: International conference on machine learning, PMLR. pp. 8748--8763.
- Complex network measures of brain connectivity: uses and interpretations. Neuroimage 52, 1059--1069.
- Alzheimer’s disease. The Lancet 397, 1577--1590.
- Few-shot text generation with pattern-exploiting training. arXiv preprint arXiv:2012.11926 .
- Bidirectional recurrent neural networks. IEEE transactions on Signal Processing 45, 2673--2681.
- Disease prediction via graph neural networks. IEEE Journal of Biomedical and Health Informatics 25, 818--826.
- Multimodal few-shot learning with frozen language models arXiv:arXiv:2106.13884.
- Towards generalist biomedical ai. arXiv preprint arXiv:2307.14334 .
- Attention is all you need. Advances in neural information processing systems 30.
- Graph attention networks. arXiv preprint arXiv:1710.10903 .
- Learning bottleneck concepts in image classification, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10962--10971.
- Changes in hippocampal connectivity in the early stages of alzheimer’s disease: evidence from resting state fmri. Neuroimage 31, 496--504.
- Convolutional neural networks for classification of alzheimer’s disease: Overview and reproducible evaluation. Medical image analysis 63, 101694.
- Integrating human brain proteomes with genome-wide association data implicates new proteins in alzheimer’s disease pathogenesis. Nature genetics 53, 143--146.
- Towards generalist foundation model for radiology. arXiv preprint arXiv:2308.02463 .
- On the challenges and perspectives of foundation models for medical image analysis. arXiv preprint arXiv:2306.05705 .
- Meta-transformer: A unified framework for multimodal learning. arXiv preprint arXiv:2307.10802 .
- Input augmentation with sam: Boosting medical image segmentation with segmentation foundation model, in: MICCAI, Springer. pp. 129--139.
- Conditional prompt learning for vision-language models, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16816--16825.
- Interpretable learning based dynamic graph convolutional networks for alzheimer’s disease analysis. Information Fusion 77, 53--61.