C^2M-DoT: Cross-modal consistent multi-view medical report generation with domain transfer network (2310.05355v1)
Abstract: In clinical scenarios, multiple medical images with different views are usually generated simultaneously, and these images have high semantic consistency. However, most existing medical report generation methods only consider single-view data. The rich multi-view mutual information of medical images can help generate more accurate reports, however, the dependence of multi-view models on multi-view data in the inference stage severely limits their application in clinical practice. In addition, word-level optimization based on numbers ignores the semantics of reports and medical images, and the generated reports often cannot achieve good performance. Therefore, we propose a cross-modal consistent multi-view medical report generation with a domain transfer network (C2M-DoT). Specifically, (i) a semantic-based multi-view contrastive learning medical report generation framework is adopted to utilize cross-view information to learn the semantic representation of lesions; (ii) a domain transfer network is further proposed to ensure that the multi-view report generation model can still achieve good inference performance under single-view input; (iii) meanwhile, optimization using a cross-modal consistency loss facilitates the generation of textual reports that are semantically consistent with medical images. Extensive experimental studies on two public benchmark datasets demonstrate that C2M-DoT substantially outperforms state-of-the-art baselines in all metrics. Ablation studies also confirmed the validity and necessity of each component in C2M-DoT.
- Automatic generation of chest x-ray reports using a transformer-based deep learning model, in: 2021 Fifth International Conference on Intelligent Computing in Data Sciences (ICDS), IEEE. pp. 1–5.
- Bottom-up and top-down attention for image captioning and visual question answering, in: Proceedings of CVPR, pp. 6077–6086.
- Big self-supervised models advance medical image classification, in: Proceedings of ICCV, pp. 3478–3488.
- Meteor: An automatic metric for mt evaluation with improved correlation with human judgments, in: Proceedings of ACL, pp. 65–72.
- A simple framework for contrastive learning of visual representations, in: Proceedings of ICML, pp. 1597–1607.
- Generating radiology reports via memory-driven transformer. arXiv preprint arXiv:2010.16056 .
- Preparing a collection of radiology examinations for distribution and retrieval. Journal of the American Medical Informatics Association 23, 304–310.
- Imagenet: A large-scale hierarchical image database, in: Proceedings of CVPR, pp. 248–255.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 .
- PubMedCLIP: How much does CLIP benefit visual question answering in the medical domain?, in: Proceedings of CVPR, pp. 1181–1193.
- Simcse: Simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821 .
- Momentum contrast for unsupervised visual representation learning, in: Proceedings of CVPR, pp. 9729–9738.
- Deep residual learning for image recognition, in: Proceedings of CVPR, pp. 770–778.
- Ratchet: Medical transformer for chest x-ray diagnosis and reporting, in: Proceedings of MICCAI, pp. 293–303.
- Gloria: A multimodal global-local representation learning framework for label-efficient medical image recognition, in: Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV), pp. 3942–3951.
- Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144 .
- Show, describe and conclude: On exploiting the structure information of chest X-ray reports, in: Proceedings of the Association for Computational Linguistics, pp. 6570–6580. doi:10.18653/v1/P19-1657.
- On the automatic generation of medical imaging reports, in: Proceedings of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2577–2586.
- MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci. Data 6, 317. Doi:10.1038/s41597-019-0322-0.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 .
- Hybrid retrieval-generation reinforced agent for medical image report generation, in: Advances in Neural Information Processing Systems, pp. 1530–1540.
- Rouge: A package for automatic evaluation of summaries, in: Proceedings of ACL, pp. 74–81.
- Exploring and distilling posterior and prior knowledge for radiology report generation, in: Proceedings of CVPR, pp. 13753–13762.
- Clinically accurate chest X-ray report generation. arXiv preprint arXiv:1904.02633 .
- Rectified linear units improve restricted boltzmann machines, in: Proceedings of ICML, pp. 807–814.
- X-linear attention networks for image captioning, in: Proceedings of CVPR, pp. 10971–10980.
- Bleu: a method for automatic evaluation of machine translation, in: Proceedings of ACL, pp. 311–318.
- Radiology objects in context (roco): a multimodal image dataset, in: Intravascular Imaging and Computer Assisted Stenting and Large-Scale Annotation of Biomedical Data and Expert Label Synthesis: 7th Joint International Workshop, CVII-STENT 2018 and Third International Workshop, LABELS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16, 2018, Proceedings 3, Springer. pp. 180–189.
- Learning transferable visual models from natural language supervision, in: Proceedings of International conference on machine learning(ICML), pp. 8748–8763.
- Self-critical sequence training for image captioning, in: Proceedings of CVPR, pp. 7008–7024.
- Breaking with fixed set pathology recognition through report-guided contrastive training, in: Proceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention(MICCAI), pp. 690–700.
- Contrastive multiview coding, in: Proceedings of ECCV, pp. 776–794.
- Diverse beam search: Decoding diverse solutions from neural sequence models. arXiv preprint arXiv:1610.02424 .
- Medaug: Contrastive learning leveraging patient metadata improves representations for chest x-ray interpretation, in: Proceedings of MLHC, pp. 755–769.
- Medclip: Contrastive learning from unpaired medical images and text. arXiv preprint arXiv:2210.10163 .
- Medklip: Medical knowledge enhanced language-image pre-training. medRxiv , 2023–01.
- Reinforced transformer for medical image captioning, in: Proceedings of MICCAI, pp. 673–680.
- Hybrid reinforced medical report generation with m-linear attention and repetition penalty. arXiv preprint arXiv:2210.13729 .
- Multimodal recurrent model with attention for automated radiology report generation, in: Proceedings of MICCAI, pp. 457–466.
- Weakly supervised contrastive learning for chest x-ray report generation. arXiv preprint arXiv:2109.12242 .
- Aligntransformer: Hierarchical alignment of visual regions and disease tags for medical report generation, in: Proceedings of Medical Image Computing and Computer Assisted Intervention(MICCAI), pp. 72–82.
- Automatic radiology report generation based on multi-view image fusion and medical concept enrichment, in: Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part VI 22, pp. 721–729.
- Multi-condos: Multimodal contrastive domain sharing generative adversarial networks for self-supervised medical image segmentation. IEEE Transactions on Medical Imaging .
- Multi-modal contrastive mutual learning and pseudo-label re-learning for semi-supervised medical image segmentation. Medical Image Analysis 83, 102656.
- Contrastive learning of medical visual representations from paired images and text. arXiv preprint arXiv:2010.00747 .
- Contrastive learning of medical visual representations from paired images and text, in: Proceedings of MLHC, pp. 2–25.
- Ruizhi Wang (9 papers)
- Xiangtao Wang (4 papers)
- Jie Zhou (687 papers)
- Thomas Lukasiewicz (125 papers)
- Zhenghua Xu (42 papers)