Topicwise Separable Sentence Retrieval for Medical Report Generation (2405.04175v1)
Abstract: Automated radiology reporting holds immense clinical potential in alleviating the burdensome workload of radiologists and mitigating diagnostic bias. Recently, retrieval-based report generation methods have garnered increasing attention due to their inherent advantages in terms of the quality and consistency of generated reports. However, due to the long-tail distribution of the training data, these models tend to learn frequently occurring sentences and topics, overlooking the rare topics. Regrettably, in many cases, the descriptions of rare topics often indicate critical findings that should be mentioned in the report. To address this problem, we introduce a Topicwise Separable Sentence Retrieval (Teaser) for medical report generation. To ensure comprehensive learning of both common and rare topics, we categorize queries into common and rare types to learn differentiated topics, and then propose Topic Contrastive Loss to effectively align topics and queries in the latent space. Moreover, we integrate an Abstractor module following the extraction of visual features, which aids the topic decoder in gaining a deeper understanding of the visual observational intent. Experiments on the MIMIC-CXR and IU X-ray datasets demonstrate that Teaser surpasses state-of-the-art models, while also validating its capability to effectively represent rare topics and establish more dependable correspondences between queries and topics.
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, “Show and tell: A neural image caption generator,” in Proceedings of CVPR, 2015, pp. 3156–3164.
- J. Krause, J. Johnson, R. Krishna, and L. Fei-Fei, “A hierarchical approach for generating descriptive image paragraphs,” in Proceedings of CVPR, 2017, pp. 317–325.
- G. Li, L. Zhu, P. Liu, and Y. Yang, “Entangled transformer for image captioning,” in Proceedings of ICCV, 2019, pp. 8928–8937.
- B. Jing, P. Xie, and E. Xing, “On the automatic generation of medical imaging reports,” in Proceedings of MICCAI, 2018.
- Y. Li, X. Liang, Z. Hu, and E. P. Xing, “Hybrid retrieval-generation reinforced agent for medical image report generation,” in Proceedings of NeurIPS, 2018.
- Z. Chen, Y. Song, T.-H. Chang, and X. Wan, “Generating radiology reports via memory-driven transformer,” in Proceedings of EMNLP, 2020, pp. 1439–1449.
- Y. Zhang, X. Wang, Z. Xu, Q. Yu, A. Yuille, and D. Xu, “When radiology report generation meets knowledge graph,” in Proceedings of AAAI, vol. 34, no. 07, 2020, pp. 12 910–12 917.
- L. Yang, Z. Wang, and L. Zhou, “Medxchat: Bridging cxr modalities with a unified multimodal large model,” arXiv preprint arXiv:2312.02233, 2023.
- Z. Wang, H. Han, L. Wang, X. Li, and L. Zhou, “Automated radiographic report generation purely on transformer: A multicriteria supervised approach,” IEEE Transactions on Medical Imaging, vol. 41, no. 10, pp. 2803–2813, 2022.
- F. Liu, C. You, X. Wu, S. Ge, X. Sun et al., “Auto-encoding knowledge graph for unsupervised medical report generation,” in Proceedings of NeurIPS, vol. 34, 2021, pp. 16 266–16 279.
- F. Liu, X. Wu, S. Ge, W. Fan, and Y. Zou, “Exploring and distilling posterior and prior knowledge for radiology report generation,” in Proceedings of CVPR, 2021, pp. 13 753–13 762.
- C. Pellegrini, E. Özsoy, B. Busam, N. Navab, and M. Keicher, “Radialog: A large vision-language model for radiology report generation and conversational assistance,” arXiv preprint arXiv:2311.18681, 2023.
- M. Endo, R. Krishnan, V. Krishna, A. Y. Ng, and P. Rajpurkar, “Retrieval-based chest x-ray report generation using a pre-trained contrastive language-image model,” in Machine Learning for Health, 2021, pp. 209–219.
- Z. Han, B. Wei, S. Leung, J. Chung, and S. Li, “Towards automatic report generation in spine radiology using weakly supervised framework,” in Proceedings of MICCAI. Springer, 2018, pp. 185–193.
- M. Kong, Z. Huang, K. Kuang, Q. Zhu, and F. Wu, “Transq: Transformer-based semantic query for medical report generation,” in Proceedings of MICCAI. Springer, 2022, pp. 610–620.
- Q. You, H. Jin, Z. Wang, C. Fang, and J. Luo, “Image captioning with semantic attention,” in Proceedings of CVPR, 2016, pp. 4651–4659.
- X. Liang, Z. Hu, H. Zhang, C. Gan, and E. P. Xing, “Recurrent topic-transition gan for visual paragraph generation,” in Proceedings of ICCV, 2017, pp. 3362–3371.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in Neural Information processing Systems, vol. 30, 2017.
- S. Yang, X. Wu, S. Ge, S. K. Zhou, and L. Xiao, “Knowledge matters: Chest radiology report generation with general and specific knowledge,” Medical Image Analysis, vol. 80, p. 102510, 2022.
- J. Li, S. Li, Y. Hu, and H. Tao, “A self-guided framework for radiology report generation,” in Proceedings of MICCAI. Springer, 2022, pp. 588–598.
- M. Li, W. Cai, K. Verspoor, S. Pan, X. Liang, and X. Chang, “Cross-modal clinical graph transformer for ophthalmic report generation,” in Proceedings of CVPR, 2022, pp. 20 656–20 665.
- Z. Chen, Y. Song, T.-H. Chang, and X. Wan, “Generating radiology reports via memory-driven transformer,” in Proceedings of EMNLP, 2020.
- Z. Chen, Y. Shen, Y. Song, and X. Wan, “Cross-modal memory networks for radiology report generation,” in Proceedings of ACL-IJCNLP, 2021, pp. 5904–5914.
- C. Y. Li, X. Liang, Z. Hu, and E. P. Xing, “Knowledge-driven encode, retrieve, paraphrase for medical image report generation,” in Proceedings of AAAI, vol. 33, no. 01, 2019, pp. 6666–6673.
- M. Li, R. Liu, F. Wang, X. Chang, and X. Liang, “Auxiliary signal-guided knowledge encoder-decoder for medical report generation,” World Wide Web, vol. 26, no. 1, pp. 253–270, 2023.
- M. Li, B. Lin, Z. Chen, H. Lin, X. Liang, and X. Chang, “Dynamic graph enhanced contrastive learning for chest x-ray report generation,” in Proceedings of CVPR, 2023, pp. 3334–3343.
- J. Ni, C.-N. Hsu, A. Gentili, and J. McAuley, “Learning visual-semantic embeddings for reporting abnormal findings on chest x-rays,” arXiv preprint arXiv:2010.02467, 2020.
- X. Yang, M. Ye, Q. You, and F. Ma, “Writing by memorizing: Hierarchical retrieval-based medical report generation,” arXiv preprint arXiv:2106.06471, 2021.
- A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in Proceedings of ICML, 2021, pp. 8748–8763.
- D. Gao, M. Kong, Y. Zhao, J. Huang, Z. Huang, K. Kuang, F. Wu, and Q. Zhu, “Simulating doctors’ thinking logic for chest x-ray report generation via transformer-based semantic query learning,” Medical Image Analysis, p. 102982, 2023.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” in Proceedings of ICLR, 2021.
- A. Jaegle, F. Gimeno, A. Brock, O. Vinyals, A. Zisserman, and J. Carreira, “Perceiver: General perception with iterative attention,” in Proceedings of ICML. PMLR, 2021, pp. 4651–4664.
- J.-B. Alayrac, J. Donahue, P. Luc, A. Miech, I. Barr, Y. Hasson, K. Lenc, A. Mensch, K. Millican, M. Reynolds et al., “Flamingo: a visual language model for few-shot learning,” Advances in Neural Information Processing Systems, vol. 35, pp. 23 716–23 736, 2022.
- N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using siamese bert-networks,” in Proceedings of EMNLP-IJCNLP, 2019, pp. 3982–3992.
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
- M. Ahmed, R. Seraj, and S. M. S. Islam, “The k-means algorithm: A comprehensive survey and performance evaluation,” Electronics, vol. 9, no. 8, p. 1295, 2020.
- H. W. Kuhn, “The hungarian method for the assignment problem,” Naval Research Logistics Quarterly, vol. 2, no. 1-2, pp. 83–97, 1955.
- T. Wu, Q. Huang, Z. Liu, Y. Wang, and D. Lin, “Distribution-balanced loss for multi-label classification in long-tailed datasets,” in Proceedings of ECCV. Springer, 2020, pp. 162–178.
- A. E. Johnson, T. J. Pollard, N. R. Greenbaum, M. P. Lungren, C.-y. Deng, Y. Peng, Z. Lu, R. G. Mark, S. J. Berkowitz, and S. Horng, “Mimic-cxr-jpg, a large publicly available database of labeled chest radiographs,” arXiv preprint arXiv:1901.07042, 2019.
- D. Demner-Fushman, M. D. Kohli, M. B. Rosenman, S. E. Shooshan, L. Rodriguez, S. Antani, G. R. Thoma, and C. J. McDonald, “Preparing a collection of radiology examinations for distribution and retrieval,” Journal of the American Medical Informatics Association, vol. 23, no. 2, pp. 304–310, 2016.
- Y. Peng, X. Wang, L. Lu, M. Bagheri, R. Summers, and Z. Lu, “Negbio: a high-performance tool for negation and uncertainty detection in radiology reports,” AMIA Summits on Translational Science Proceedings, vol. 2018, p. 188, 2018.
- J. Irvin, P. Rajpurkar, M. Ko, Y. Yu, S. Ciurea-Ilcus, C. Chute, H. Marklund, B. Haghgoo, R. Ball, K. Shpanskaya et al., “Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison,” in Proceedings of AAAI, vol. 33, no. 01, 2019, pp. 590–597.
- K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “Bleu: a method for automatic evaluation of machine translation,” in Proceedings of ACL, 2002, pp. 311–318.
- S. Banerjee and A. Lavie, “Meteor: An automatic metric for mt evaluation with improved correlation with human judgments,” in Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, 2005, pp. 65–72.
- C.-Y. Lin, “Rouge: A package for automatic evaluation of summaries,” in Text Summarization Branches Out, 2004, pp. 74–81.
- S. Yan, W. K. Cheung, K. Chiu, T. M. Tong, K. C. Cheung, and S. See, “Attributed abnormality graph embedding for clinically accurate x-ray report generation,” IEEE Transactions on Medical Imaging, 2023.
- “Hugging face,” https://huggingface.co.
- I. Loshchilov and F. Hutter, “Fixing weight decay regularization in adam,” 2018.
- F. Liu, C. Yin, X. Wu, S. Ge, P. Zhang, and X. Sun, “Contrastive attention for automatic chest x-ray report generation,” in Proceedings of ACL-IJCNLP, 2021, pp. 269–280.
- J. Zhang, X. Shen, S. Wan, S. K. Goudos, J. Wu, M. Cheng, and W. Zhang, “A novel deep learning model for medical report generation by inter-intra information calibration,” IEEE Journal of Biomedical and Health Informatics, 2023.
- Z. Wang, L. Liu, L. Wang, and L. Zhou, “Metransformer: Radiology report generation by transformer with multiple learnable expert tokens,” in Proceedings of CVPR, 2023, pp. 11 558–11 567.
- K. Zhang, H. Jiang, J. Zhang, Q. Huang, J. Fan, J. Yu, and W. Han, “Semi-supervised medical report generation via graph-guided hybrid feature consistency,” IEEE Transactions on Multimedia, 2023.
- H. Jin, H. Che, Y. Lin, and H. Chen, “Promptmrg: Diagnosis-driven prompts for medical report generation,” in Proceedings of AAAI, vol. 38, no. 3, 2024, pp. 2607–2615.
- Junting Zhao (2 papers)
- Yang Zhou (311 papers)
- Zhihao Chen (66 papers)
- Huazhu Fu (185 papers)
- Liang Wan (32 papers)