Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Knowledge-enhanced Visual-Language Pre-training on Chest Radiology Images (2302.14042v3)

Published 27 Feb 2023 in cs.CV

Abstract: While multi-modal foundation models pre-trained on large-scale data have been successful in natural language understanding and vision recognition, their use in medical domains is still limited due to the fine-grained nature of medical tasks and the high demand for domain knowledge. To address this challenge, we propose a novel approach called Knowledge-enhanced Auto Diagnosis (KAD) which leverages existing medical domain knowledge to guide vision-language pre-training using paired chest X-rays and radiology reports. We evaluate KAD on {four} external X-ray datasets and demonstrate that its zero-shot performance is not only comparable to that of fully-supervised models, but also superior to the average of three expert radiologists for three (out of five) pathologies with statistical significance. Moreover, when few-shot annotation is available, KAD outperforms all existing approaches in fine-tuning settings, demonstrating its potential for application in different clinical scenarios.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. Flamingo: a visual language model for few-shot learning. arXiv preprint arXiv:2204.14198, 2022.
  2. An overview of metamap: historical perspective and recent advances. Journal of the American Medical Informatics Association, 17(3):229–236, 2010.
  3. Contrastive language-image pre-training for the italian language. arXiv preprint arXiv:2108.08688, 2021.
  4. Olivier Bodenreider. The unified medical language system (umls): integrating biomedical terminology. Nucleic acids research, 32(suppl_1):D267–D270, 2004.
  5. Making the most of text semantics to improve biomedical vision–language processing. In European conference on computer vision, pages 1–21, 2022. Official Implementation: https://github.com/microsoft/hi-ml/tree/main/hi-ml-multimodal.
  6. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
  7. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  8. Padchest: A large chest x-ray image dataset with multi-label annotated reports. Medical image analysis, 66:101797, 2020.
  9. Joint modeling of chest radiographs and radiology reports for pulmonary edema assessment. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 529–539. Springer, 2020.
  10. Automatic scoring of multiple semantic attributes with multi-task feature leverage: a study on pulmonary nodules in ct images. IEEE transactions on medical imaging, 36(3):802–814, 2016.
  11. Uniter: Universal image-text representation learning. In European conference on computer vision, pages 104–120. Springer, 2020.
  12. Align, reason and learn: Enhancing medical vision-and-language pre-training with knowledge. In Proceedings of the 30th ACM International Conference on Multimedia, pages 5152–5161, 2022.
  13. Collaborative learning of cross-channel clinical attention for radiotherapy-related esophageal fistula prediction from ct. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 212–220. Springer, 2020.
  14. Robert Dale. Gpt-3: What’s it good for? Natural Language Engineering, 27(1):113–118, 2021.
  15. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  16. Bert: Pre-training of deep bidirectional transformers for language understanding. pages 4171–4186, 2019.
  17. Kevin Donnelly et al. Snomed-ct: The advanced terminology and coding system for ehealth. Studies in health technology and informatics, 121:279, 2006.
  18. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  19. Attention to lesion: Lesion-aware convolutional neural network for retinal optical coherence tomography image classification. IEEE transactions on medical imaging, 38(8):1959–1970, 2019.
  20. Ivan Gonzalez-Diaz. Dermaknet: Incorporating the knowledge of dermatologists to convolutional neural networks for skin lesion diagnosis. IEEE journal of biomedical and health informatics, 23(2):547–559, 2018.
  21. Domain-specific language model pretraining for biomedical natural language processing. ACM Transactions on Computing for Healthcare (HEALTH), 3(1):1–23, 2021.
  22. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  23. Gloria: A multimodal global-local representation learning framework for label-efficient medical image recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3942–3951, 2021. Official Implementation: https://github.com/marshuang80/gloria.
  24. Dual-ray net: automatic diagnosis of thoracic diseases using frontal and lateral chest x-rays. Journal of Medical Imaging and Health Informatics, 10(2):348–355, 2020.
  25. Risk stratification of lung nodules using 3d cnn-based multi-task learning. In International conference on information processing in medical imaging, pages 249–260. Springer, 2017.
  26. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 590–597, 2019.
  27. Radgraph: Extracting clinical entities and relations from radiology reports. In J. Vanschoren and S. Yeung, editors, Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, volume 1, 2021. URL https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/file/c8ffe9a587b126f152ed3d89a146b445-Paper-round1.pdf.
  28. Scaling up visual and vision-language representation learning with noisy text supervision. In International Conference on Machine Learning, pages 4904–4916. PMLR, 2021.
  29. Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data, 6(1):1–8, 2019.
  30. Align before fuse: Vision and language representation learning with momentum distillation. Advances in neural information processing systems, 34:9694–9705, 2021.
  31. Attention based glaucoma detection: a large-scale database and cnn model. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10571–10580, 2019a.
  32. Visualbert: A simple and performant baseline for vision and language. arXiv preprint arXiv:1908.03557, 2019b.
  33. Canet: cross-disease attention network for joint diabetic retinopathy and diabetic macular edema grading. IEEE transactions on medical imaging, 39(5):1483–1493, 2019c.
  34. Multi-task deep convolutional neural network for cancer diagnosis. Neurocomputing, 348:66–73, 2019.
  35. Chestx-det10: Chest x-ray dataset on detection of thoracic abnormalities, 2020.
  36. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  37. Open-vocabulary semantic segmentation with frozen vision-language models. arXiv preprint arXiv:2210.15138, 2022.
  38. Training medical image analysis systems like radiologists. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 546–554. Springer, 2018.
  39. Chexpert++: Approximating the chexpert labeler for speed, differentiability, and probabilistic output. In Machine Learning for Healthcare Conference, pages 913–927. PMLR, 2020.
  40. Embedding human knowledge into deep neural network via attention map. arXiv preprint arXiv:1905.03540, 2019.
  41. Joint learning of localized representations from medical images and reports. arXiv preprint arXiv:2112.02889, 2021.
  42. Scispacy: Fast and robust models for biomedical natural language processing. In Proceedings of the 18th BioNLP Workshop and Shared Task, pages 319–327. Association for Computational Linguistics, 2019. doi: 10.18653/v1/W19-5034. URL https://aclanthology.org/W19-5034.
  43. Negbio: a high-performance tool for negation and uncertainty detection in radiology reports. AMIA Summits on Translational Science Proceedings, 2018:188, 2018.
  44. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
  45. Chexternal: Generalization of deep learning models for chest x-ray interpretation to photos of chest x-rays and external clinical settings. In Proceedings of the Conference on Health, Inference, and Learning, pages 125–132, 2021.
  46. How much can clip benefit vision-and-language tasks? arXiv preprint arXiv:2107.06383, 2021.
  47. Combining automatic labelers and expert annotations for accurate radiology report labeling using bert. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1500–1519, 2020.
  48. Expert knowledge-infused deep learning for automatic lung nodule detection. Journal of X-ray Science and Technology, 27(1):17–35, 2019.
  49. Expert-level detection of pathologies from unannotated chest x-ray images via self-supervised learning. Nature Biomedical Engineering, pages 1–8, 2022.
  50. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  51. Entity, relation, and event extraction with contextualized span representations. arXiv preprint arXiv:1909.03546, 2019.
  52. Learning to recognize thoracic disease in chest x-rays with knowledge-guided deep zoom neural networks. IEEE Access, 8:159790–159805, 2020.
  53. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2097–2106, 2017.
  54. K-diag: Knowledge-enhanced disease diagnosis in radiographic imaging, 2023a.
  55. Medklip: Medical knowledge enhanced language-image pre-training. medRxiv, 2023b.
  56. Chest imagenome dataset for clinical reasoning. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.
  57. A survey on incorporating domain knowledge into deep learning for medical image analysis. Medical Image Analysis, 69:101985, 2021.
  58. Knowledge-based collaborative deep learning for benign-malignant lung nodule classification on chest ct. IEEE transactions on medical imaging, 38(4):991–1004, 2018.
  59. Dscgans: Integrate domain knowledge in training dual-path semi-supervised conditional generative adversarial networks and s3vm for ultrasonography thyroid nodules classification. In International conference on medical image computing and computer-assisted intervention, pages 558–566. Springer, 2019.
  60. Top-down neural attention by excitation backprop. International Journal of Computer Vision, 126(10):1084–1102, 2018.
  61. Contrastive learning of medical visual representations from paired images and text. In Machine Learning for Healthcare, 2022. Highest Starred Implementation: https://github.com/edreisMD/ConVIRT-pytorch.
  62. Comparing to learn: Surpassing imagenet pretraining on radiographs by comparing image representations. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 398–407. Springer, 2020.
  63. Generalized radiograph representation learning via cross-supervision between images and free-text radiology reports. Nature Machine Intelligence, 4(1):32–40, 2022.
  64. Models genesis. Medical image analysis, 67:101840, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Xiaoman Zhang (31 papers)
  2. Chaoyi Wu (24 papers)
  3. Ya Zhang (222 papers)
  4. Yanfeng Wang (211 papers)
  5. Weidi Xie (132 papers)
Citations (74)