Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge Enhancement (2403.06659v3)
Abstract: Electrocardiograms (ECGs) are non-invasive diagnostic tools crucial for detecting cardiac arrhythmic diseases in clinical practice. While ECG Self-supervised Learning (eSSL) methods show promise in representation learning from unannotated ECG data, they often overlook the clinical knowledge that can be found in reports. This oversight and the requirement for annotated samples for downstream tasks limit eSSL's versatility. In this work, we address these issues with the Multimodal ECG Representation Learning (MERL}) framework. Through multimodal learning on ECG records and associated reports, MERL is capable of performing zero-shot ECG classification with text prompts, eliminating the need for training data in downstream tasks. At test time, we propose the Clinical Knowledge Enhanced Prompt Engineering (CKEPE) approach, which uses LLMs to exploit external expert-verified clinical knowledge databases, generating more descriptive prompts and reducing hallucinations in LLM-generated content to boost zero-shot classification. Based on MERL, we perform the first benchmark across six public ECG datasets, showing the superior performance of MERL compared against eSSL methods. Notably, MERL achieves an average AUC score of 75.2% in zero-shot classification (without training data), 3.2% higher than linear probed eSSL methods with 10\% annotated training data, averaged across all six datasets. Code and models are available at https://github.com/cheliu-computation/MERL
- Publicly available clinical bert embeddings. arXiv preprint arXiv:1904.03323, 2019.
- Anonymous. Towards enhancing time series contrastive learning: A dynamic bad pair mining approach. In Submitted to The Twelfth International Conference on Learning Representations, 2023a. URL https://openreview.net/forum?id=K2c04ulKXn. under review.
- Anonymous. Guiding masked representation learning to capture spatio-temporal relationship of electrocardiogram. In Submitted to The Twelfth International Conference on Learning Representations, 2023b. URL https://openreview.net/forum?id=WcOohbsF4H. under review.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning, pp. 1597–1607. PMLR, 2020.
- Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 15750–15758, 2021.
- An empirical study of training self-supervised vision transformers. in 2021 ieee. In CVF International Conference on Computer Vision (ICCV), pp. 9620–9629, 2021.
- Generative text-guided 3d vision-language pretraining for unified medical image segmentation. arXiv preprint arXiv:2306.04811, 2023.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Time-series representation learning via temporal and contextual contrasting. arXiv preprint arXiv:2106.14112, 2021.
- Simcse: Simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821, 2021.
- Mimic-iv-ecg-diagnostic electrocardiogram matched subset.
- Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33:21271–21284, 2020.
- Domain-specific language model pretraining for biomedical natural language processing. ACM Transactions on Computing for Healthcare (HEALTH), 3(1):1–23, 2021.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 16000–16009, 2022.
- Huang, Y. and Yen. Snippet policy network v2: Knee-guided neuroevolution for multi-lead ecg early classification. IEEE Transactions on Neural Networks and Learning Systems, 2022.
- Snippet policy network for multi-class varied-length ecg early classification. IEEE Transactions on Knowledge & Data Engineering, 35(06):6349–6361, 2023.
- Maira-1: A specialised large multimodal model for radiology report generation. arXiv preprint arXiv:2311.13668, 2023.
- Medcpt: Contrastive pre-trained transformers with large-scale pubmed search logs for zero-shot biomedical information retrieval. Bioinformatics, 39(11):btad651, 2023.
- Clocs: Contrastive learning of cardiac signals across space, time, and patients. In International Conference on Machine Learning, pp. 5606–5615. PMLR, 2021.
- Practical intelligent diagnostic algorithm for wearable 12-lead ecg via self-supervised learning on large-scale dataset. Nature Communications, 14(1):3741, 2023.
- ECG representation learning with multi-modal EHR data. Transactions on Machine Learning Research, 2023. ISSN 2835-8856. URL https://openreview.net/forum?id=UxmvCwuTMG.
- Frozen language model helps ecg zero-shot learning. arXiv preprint arXiv:2303.12311, 2023.
- M-flag: Medical vision-language pre-training with frozen language models and latent space geometry optimization. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 637–647. Springer, 2023a.
- Spectral cross-domain neural network with soft-adaptive threshold spectral enhancement. arXiv preprint arXiv:2301.10171, 2023b.
- Imitate: Clinical prior guided hierarchical vision-language pre-training. arXiv preprint arXiv:2310.07355, 2023c.
- T3d: Towards 3d medical image understanding through vision-language pre-training. arXiv preprint arXiv:2312.01529, 2023d.
- G2d: From global to dense radiography representation learning via vision-language pre-training. arXiv preprint arXiv:2312.01522, 2023e.
- Utilizing synthetic data for medical vision-language pre-training: Bypassing the need for real images. arXiv preprint arXiv:2310.07027, 2023f.
- Etp: Learning transferable ecg representations via ecg-text pre-training. arXiv preprint arXiv:2309.07145, 2023g.
- An open access database for evaluating the algorithms of electrocardiogram rhythm and morphology abnormality detection. Journal of Medical Imaging and Health Informatics, 8(7):1368–1373, 2018.
- Exploring the boundaries of gpt-4 in radiology. arXiv preprint arXiv:2310.14573, 2023h.
- Pixmim: Rethinking pixel reconstruction in masked image modeling. arXiv preprint arXiv:2303.02416, 2023i.
- Improving pixel-based mim by reducing wasted modeling capability. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5361–5372, 2023j.
- Enhancing clip with gpt-4: Harnessing visual descriptions as prompts. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 262–271, 2023.
- Visual classification via description from large language models. In The Eleventh International Conference on Learning Representations, 2022.
- What does a platypus look like? generating customized prompts for zero-shot image classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15691–15701, 2023.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR, 2021a.
- Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pp. 8748–8763. PMLR, 2021b.
- Scp-ecg v3. 0: An enhanced standard communication protocol for computer-assisted electrocardiography. In 2016 Computing in Cardiology Conference (CinC), pp. 309–312. IEEE, 2016.
- Masked autoencoder-based self-supervised learning for electrocardiograms to detect left ventricular systolic dysfunction. In NeurIPS 2022 Workshop on Learning from Time Series for Health, 2022.
- Snomed clinical terms: overview of the development process and project status. In Proceedings of the AMIA Symposium, pp. 662. American Medical Informatics Association, 2001.
- Med-halt: Medical domain hallucination test for large language models. arXiv preprint arXiv:2307.15343, 2023.
- Ptb-xl, a large publicly available electrocardiography dataset. Scientific data, 7(1):1–15, 2020.
- Med-unic: Unifying cross-lingual medical vision-language pre-training by diminishing bias. arXiv preprint arXiv:2305.19894, 2023.
- Adversarial spatiotemporal contrastive learning for electrocardiogram signals. IEEE Transactions on Neural Networks and Learning Systems, 2023.
- Barlow twins: Self-supervised learning via redundancy reduction. In International Conference on Machine Learning, pp. 12310–12320. PMLR, 2021.
- Maefe: Masked autoencoders family of electrocardiogram for self-supervised pretraining and transfer learning. IEEE Transactions on Instrumentation and Measurement, 72:1–15, 2022.
- Self-supervised time series representation learning via cross reconstruction transformer. IEEE Transactions on Neural Networks and Learning Systems, 2023.
- Contrastive learning of medical visual representations from paired images and text. arXiv preprint arXiv:2010.00747, 2020.
- Optimal multi-stage arrhythmia classification approach. Scientific reports, 10(1):2898, 2020.
- A large scale 12-lead electrocardiogram database for arrhythmia study (version 1.0. 0). PhysioNet 2022Available online: http://physionet. org/content/ecg-arrhythmia/1.0. 0/(accessed on 23 November 2022), 2022.