Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bootstrapping Chest CT Image Understanding by Distilling Knowledge from X-ray Expert Models (2404.04936v1)

Published 7 Apr 2024 in cs.CV

Abstract: Radiologists highly desire fully automated versatile AI for medical imaging interpretation. However, the lack of extensively annotated large-scale multi-disease datasets has hindered the achievement of this goal. In this paper, we explore the feasibility of leveraging language as a naturally high-quality supervision for chest CT imaging. In light of the limited availability of image-report pairs, we bootstrap the understanding of 3D chest CT images by distilling chest-related diagnostic knowledge from an extensively pre-trained 2D X-ray expert model. Specifically, we propose a language-guided retrieval method to match each 3D CT image with its semantically closest 2D X-ray image, and perform pair-wise and semantic relation knowledge distillation. Subsequently, we use contrastive learning to align images and reports within the same patient while distinguishing them from the other patients. However, the challenge arises when patients have similar semantic diagnoses, such as healthy patients, potentially confusing if treated as negatives. We introduce a robust contrastive learning that identifies and corrects these false negatives. We train our model with over 12,000 pairs of chest CT images and radiology reports. Extensive experiments across multiple scenarios, including zero-shot learning, report generation, and fine-tuning processes, demonstrate the model's feasibility in interpreting chest CT images.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. The lung image database consortium (lidc) and image database resource initiative (idri): a completed reference database of lung nodules on ct scans. Medical physics, 38(2):915–931, 2011.
  2. Vlmo: Unified vision-language pre-training with mixture-of-modality-experts. NeurIPS, 35:32897–32912, 2022.
  3. Making the most of text semantics to improve biomedical vision–language processing. In ECCV, pages 1–21. Springer, 2022.
  4. Learning with privileged multimodal knowledge for unimodal segmentation. IEEE Trans. Med. Imaging, 41(3):621–632, 2021.
  5. Generating radiology reports via memory-driven transformer. In EMNLP, pages 1439–1449, 2020.
  6. Cross-modal memory networks for radiology report generation. arXiv preprint arXiv:2204.13258, 2022.
  7. Prior: Prototype representation joint learning from medical images and reports. In ICCV, pages 21361–21371, 2023.
  8. Preparing a collection of radiology examinations for distribution and retrieval. Journal of the American Medical Informatics Association, 23(2):304–310, 2016.
  9. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  10. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2020.
  11. Pyramidclip: Hierarchical feature alignment for vision-language model pretraining. NeurIPS, 35:35959–35970, 2022.
  12. Knowledge distillation: A survey. IJCV, 129:1789–1819, 2021.
  13. Contrastive embedding for generalized zero-shot learning. In CVPR, pages 2371–2381, 2021.
  14. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
  15. Gloria: A multimodal global-local representation learning framework for label-efficient medical image recognition. In ICCV, pages 3942–3951, 2021.
  16. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In AAAI, pages 590–597, 2019.
  17. Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data, 6(1):317, 2019a.
  18. Mimic-cxr-jpg, a large publicly available database of labeled chest radiographs. arXiv preprint arXiv:1901.07042, 2019b.
  19. Deep learning. Nature, 521(7553):436–444, 2015.
  20. Align before fuse: Vision and language representation learning with momentum distillation. NeurIPS, 34:9694–9705, 2021.
  21. Fine-grained semantically aligned vision-language pre-training. NeurIPS, 35:7290–7303, 2022a.
  22. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In ICML, pages 12888–12900. PMLR, 2022b.
  23. Dynamic graph enhanced contrastive learning for chest x-ray report generation. In CVPR, pages 3334–3343, 2023.
  24. A survey on deep learning in medical image analysis. Medical image analysis, 42:60–88, 2017.
  25. Liv: Language-image representations and rewards for robotic control. arXiv preprint arXiv:2306.00958, 2023.
  26. Relational knowledge distillation. In CVPR, pages 3967–3976, 2019.
  27. Learning transferable visual models from natural language supervision. In ICML, pages 8748–8763. PMLR, 2021.
  28. Deep learning in medical image analysis. Annual review of biomedical engineering, 19:221–248, 2017.
  29. Flava: A foundational language and vision alignment model. In CVPR, pages 15638–15650, 2022.
  30. Spot-adaptive knowledge distillation. IEEE TIP, 31:3359–3370, 2022.
  31. Expert-level detection of pathologies from unannotated chest x-ray images via self-supervised learning. Nature Biomedical Engineering, 6(12):1399–1406, 2022.
  32. Similarity-preserving knowledge distillation. In ICCV, pages 1365–1374, 2019.
  33. Prototype knowledge distillation for medical segmentation with missing modality. In ICASSP, pages 1–5. IEEE, 2023.
  34. Medclip: Contrastive learning from unpaired medical images and text. In EMNLP, pages 3876–3887, 2022.
  35. Learning aligned cross-modal representations for referring image segmentation. arXiv preprint arXiv:2301.06429, 2023.
  36. Towards generalist foundation model for radiology. arXiv preprint arXiv:2308.02463, 2023.
  37. Medim: Boost medical image representation via radiology report-guided masking. In MICCAI, pages 13–23. Springer, 2023.
  38. Videoclip: Contrastive pre-training for zero-shot video-text understanding. arXiv preprint arXiv:2109.14084, 2021.
  39. Vision-language pre-training with triple contrastive learning. In CVPR, pages 15671–15680, 2022.
  40. Simcvd: Simple contrastive voxel-wise representation distillation for semi-supervised medical image segmentation. IEEE Trans. Med. Imaging, 41(9):2228–2237, 2022.
  41. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv preprint arXiv:1612.03928, 2016.
  42. Knowledge-enhanced visual-language pre-training on chest radiology images. Nature Communications, 14(1):4542, 2023.
  43. Contrastive learning of medical visual representations from paired images and text. In Machine Learning for Healthcare Conference, pages 2–25. PMLR, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Weiwei Cao (5 papers)
  2. Jianpeng Zhang (35 papers)
  3. Yingda Xia (28 papers)
  4. Tony C. W. Mok (23 papers)
  5. Zi Li (33 papers)
  6. Xianghua Ye (24 papers)
  7. Le Lu (148 papers)
  8. Jian Zheng (54 papers)
  9. Yuxing Tang (18 papers)
  10. Ling Zhang (104 papers)