Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

C^2M-DoT: Cross-modal consistent multi-view medical report generation with domain transfer network (2310.05355v1)

Published 9 Oct 2023 in cs.CV

Abstract: In clinical scenarios, multiple medical images with different views are usually generated simultaneously, and these images have high semantic consistency. However, most existing medical report generation methods only consider single-view data. The rich multi-view mutual information of medical images can help generate more accurate reports, however, the dependence of multi-view models on multi-view data in the inference stage severely limits their application in clinical practice. In addition, word-level optimization based on numbers ignores the semantics of reports and medical images, and the generated reports often cannot achieve good performance. Therefore, we propose a cross-modal consistent multi-view medical report generation with a domain transfer network (C2M-DoT). Specifically, (i) a semantic-based multi-view contrastive learning medical report generation framework is adopted to utilize cross-view information to learn the semantic representation of lesions; (ii) a domain transfer network is further proposed to ensure that the multi-view report generation model can still achieve good inference performance under single-view input; (iii) meanwhile, optimization using a cross-modal consistency loss facilitates the generation of textual reports that are semantically consistent with medical images. Extensive experimental studies on two public benchmark datasets demonstrate that C2M-DoT substantially outperforms state-of-the-art baselines in all metrics. Ablation studies also confirmed the validity and necessity of each component in C2M-DoT.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Automatic generation of chest x-ray reports using a transformer-based deep learning model, in: 2021 Fifth International Conference on Intelligent Computing in Data Sciences (ICDS), IEEE. pp. 1–5.
  2. Bottom-up and top-down attention for image captioning and visual question answering, in: Proceedings of CVPR, pp. 6077–6086.
  3. Big self-supervised models advance medical image classification, in: Proceedings of ICCV, pp. 3478–3488.
  4. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments, in: Proceedings of ACL, pp. 65–72.
  5. A simple framework for contrastive learning of visual representations, in: Proceedings of ICML, pp. 1597–1607.
  6. Generating radiology reports via memory-driven transformer. arXiv preprint arXiv:2010.16056 .
  7. Preparing a collection of radiology examinations for distribution and retrieval. Journal of the American Medical Informatics Association 23, 304–310.
  8. Imagenet: A large-scale hierarchical image database, in: Proceedings of CVPR, pp. 248–255.
  9. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 .
  10. PubMedCLIP: How much does CLIP benefit visual question answering in the medical domain?, in: Proceedings of CVPR, pp. 1181–1193.
  11. Simcse: Simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821 .
  12. Momentum contrast for unsupervised visual representation learning, in: Proceedings of CVPR, pp. 9729–9738.
  13. Deep residual learning for image recognition, in: Proceedings of CVPR, pp. 770–778.
  14. Ratchet: Medical transformer for chest x-ray diagnosis and reporting, in: Proceedings of MICCAI, pp. 293–303.
  15. Gloria: A multimodal global-local representation learning framework for label-efficient medical image recognition, in: Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV), pp. 3942–3951.
  16. Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144 .
  17. Show, describe and conclude: On exploiting the structure information of chest X-ray reports, in: Proceedings of the Association for Computational Linguistics, pp. 6570–6580. doi:10.18653/v1/P19-1657.
  18. On the automatic generation of medical imaging reports, in: Proceedings of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2577–2586.
  19. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci. Data 6, 317. Doi:10.1038/s41597-019-0322-0.
  20. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 .
  21. Hybrid retrieval-generation reinforced agent for medical image report generation, in: Advances in Neural Information Processing Systems, pp. 1530–1540.
  22. Rouge: A package for automatic evaluation of summaries, in: Proceedings of ACL, pp. 74–81.
  23. Exploring and distilling posterior and prior knowledge for radiology report generation, in: Proceedings of CVPR, pp. 13753–13762.
  24. Clinically accurate chest X-ray report generation. arXiv preprint arXiv:1904.02633 .
  25. Rectified linear units improve restricted boltzmann machines, in: Proceedings of ICML, pp. 807–814.
  26. X-linear attention networks for image captioning, in: Proceedings of CVPR, pp. 10971–10980.
  27. Bleu: a method for automatic evaluation of machine translation, in: Proceedings of ACL, pp. 311–318.
  28. Radiology objects in context (roco): a multimodal image dataset, in: Intravascular Imaging and Computer Assisted Stenting and Large-Scale Annotation of Biomedical Data and Expert Label Synthesis: 7th Joint International Workshop, CVII-STENT 2018 and Third International Workshop, LABELS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16, 2018, Proceedings 3, Springer. pp. 180–189.
  29. Learning transferable visual models from natural language supervision, in: Proceedings of International conference on machine learning(ICML), pp. 8748–8763.
  30. Self-critical sequence training for image captioning, in: Proceedings of CVPR, pp. 7008–7024.
  31. Breaking with fixed set pathology recognition through report-guided contrastive training, in: Proceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention(MICCAI), pp. 690–700.
  32. Contrastive multiview coding, in: Proceedings of ECCV, pp. 776–794.
  33. Diverse beam search: Decoding diverse solutions from neural sequence models. arXiv preprint arXiv:1610.02424 .
  34. Medaug: Contrastive learning leveraging patient metadata improves representations for chest x-ray interpretation, in: Proceedings of MLHC, pp. 755–769.
  35. Medclip: Contrastive learning from unpaired medical images and text. arXiv preprint arXiv:2210.10163 .
  36. Medklip: Medical knowledge enhanced language-image pre-training. medRxiv , 2023–01.
  37. Reinforced transformer for medical image captioning, in: Proceedings of MICCAI, pp. 673–680.
  38. Hybrid reinforced medical report generation with m-linear attention and repetition penalty. arXiv preprint arXiv:2210.13729 .
  39. Multimodal recurrent model with attention for automated radiology report generation, in: Proceedings of MICCAI, pp. 457–466.
  40. Weakly supervised contrastive learning for chest x-ray report generation. arXiv preprint arXiv:2109.12242 .
  41. Aligntransformer: Hierarchical alignment of visual regions and disease tags for medical report generation, in: Proceedings of Medical Image Computing and Computer Assisted Intervention(MICCAI), pp. 72–82.
  42. Automatic radiology report generation based on multi-view image fusion and medical concept enrichment, in: Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part VI 22, pp. 721–729.
  43. Multi-condos: Multimodal contrastive domain sharing generative adversarial networks for self-supervised medical image segmentation. IEEE Transactions on Medical Imaging .
  44. Multi-modal contrastive mutual learning and pseudo-label re-learning for semi-supervised medical image segmentation. Medical Image Analysis 83, 102656.
  45. Contrastive learning of medical visual representations from paired images and text. arXiv preprint arXiv:2010.00747 .
  46. Contrastive learning of medical visual representations from paired images and text, in: Proceedings of MLHC, pp. 2–25.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Ruizhi Wang (9 papers)
  2. Xiangtao Wang (4 papers)
  3. Jie Zhou (687 papers)
  4. Thomas Lukasiewicz (125 papers)
  5. Zhenghua Xu (42 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.