A Systematic Review of Deep Learning-based Research on Radiology Report Generation (2311.14199v2)
Abstract: Radiology report generation (RRG) aims to automatically generate free-text descriptions from clinical radiographs, e.g., chest X-Ray images. RRG plays an essential role in promoting clinical automation and presents significant help to provide practical assistance for inexperienced doctors and alleviate radiologists' workloads. Therefore, consider these meaningful potentials, research on RRG is experiencing explosive growth in the past half-decade, especially with the rapid development of deep learning approaches. Existing studies perform RRG from the perspective of enhancing different modalities, provide insights on optimizing the report generation process with elaborated features from both visual and textual information, and further facilitate RRG with the cross-modal interactions among them. In this paper, we present a comprehensive review of deep learning-based RRG from various perspectives. Specifically, we firstly cover pivotal RRG approaches based on the task-specific features of radiographs, reports, and the cross-modal relations between them, and then illustrate the benchmark datasets conventionally used for this task with evaluation metrics, subsequently analyze the performance of different approaches and finally offer our summary on the challenges and the trends in future directions. Overall, the goal of this paper is to serve as a tool for understanding existing literature and inspiring potential valuable research in the field of RRG.
- F. Narváez, G. Díaz, C. Poveda, and E. Romero, “An Automatic BI-RADS Description of Mammographic Masses by Fusing Multiresolution Features,” Expert Systems with Applications, vol. 74, pp. 82–95, 2017.
- C.-H. Wei, Y. Li, and P. J. Huang, “Mammogram Retrieval through Machine Learning within BI-RADS Standards,” Journal of Biomedical Informatics, vol. 44, no. 4, pp. 607–614, 2011.
- E. Burnside, D. Rubin, and R. Shachter, “A Bayesian Network for Mammography,” Proceedings / AMIA … Annual Symposium. AMIA Symposium, pp. 106–10, 02 2000.
- B. Jing, P. Xie, and E. Xing, “On the Automatic Generation of Medical Imaging Reports,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, Jul. 2018, pp. 2577–2586.
- Y. Li, X. Liang, Z. Hu, and E. P. Xing, “Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation,” in Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, S. Bengio, H. M. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, Eds., 2018, pp. 1537–1547.
- T. Nishino, R. Ozaki, Y. Momoki, T. Taniguchi, R. Kano, N. Nakano, Y. Tagawa, M. Taniguchi, T. Ohkuma, and K. Nakamura, “Reinforcement Learning with Imbalanced Dataset for Data-to-Text Medical Report Generation,” in Findings of the Association for Computational Linguistics: EMNLP 2020, T. Cohn, Y. He, and Y. Liu, Eds., Online, Nov. 2020, pp. 2223–2236.
- Z. Chen, Y. Song, T.-H. Chang, and X. Wan, “Generating Radiology Reports via Memory-driven Transformer,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, Nov. 2020, pp. 1439–1449.
- F. Nooralahzadeh, N. Perez Gonzalez, T. Frauenfelder, K. Fujimoto, and M. Krauthammer, “Progressive Transformer-Based Generation of Radiology Reports,” in Findings of the Association for Computational Linguistics: EMNLP 2021, M.-F. Moens, X. Huang, L. Specia, and S. W.-t. Yih, Eds., Punta Cana, Dominican Republic, Nov. 2021, pp. 2824–2832.
- J. You, D. Li, M. Okumura, and K. Suzuki, “JPG - Jointly Learn to Align: Automated Disease Prediction and Radiology Report Generation,” in Proceedings of the 29th International Conference on Computational Linguistics, N. Calzolari, C.-R. Huang, H. Kim, J. Pustejovsky, L. Wanner, K.-S. Choi, P.-M. Ryu, H.-H. Chen, L. Donatelli, H. Ji, S. Kurohashi, P. Paggio, N. Xue, S. Kim, Y. Hahm, Z. He, T. K. Lee, E. Santus, F. Bond, and S.-H. Na, Eds., Oct. 2022, pp. 5989–6001.
- J.-B. Delbrouck, P. Chambon, C. Bluethgen, E. Tsai, O. Almusa, and C. Langlotz, “Improving the Factual Correctness of Radiology Report Generation with Semantic Rewards,” in Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, Dec. 2022, pp. 4348–4360.
- Z. Wang, L. Liu, L. Wang, and L. Zhou, “METransformer: Radiology Report Generation by Transformer with Multiple Learnable Expert Tokens,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023, 2023, pp. 11 558–11 567.
- Z. Huang, X. Zhang, and S. Zhang, “KiUT: Knowledge-injected U-Transformer for Radiology Report Generation,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023. IEEE, 2023, pp. 19 809–19 818.
- B. Jing, Z. Wang, and E. Xing, “Show, Describe and Conclude: On Exploiting the Structure Information of Chest X-ray Reports,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, Jul. 2019, pp. 6570–6580.
- C. Y. Li, X. Liang, Z. Hu, and E. P. Xing, “Knowledge-driven Encode, Retrieve, Paraphrase for Medical Image Report Generation,” in The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019, 2019, pp. 6666–6673.
- Y. Zhang, X. Wang, Z. Xu, Q. Yu, A. L. Yuille, and D. Xu, “When Radiology Report Generation Meets Knowledge Graph,” in The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, 2020, pp. 12 910–12 917.
- F. Liu, X. Wu, S. Ge, W. Fan, and Y. Zou, “Exploring and Distilling Posterior and Prior Knowledge for Radiology Report Generation,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, 2021, pp. 13 753–13 762.
- Z. Chen, Y. Shen, Y. Song, and X. Wan, “Cross-modal Memory Networks for Radiology Report Generation,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online, Aug. 2021, pp. 5904–5914.
- Y. Zhou, L. Huang, T. Zhou, H. Fu, and L. Shao, “Visual-textual Attentive Semantic Consistency for Medical Report Generation,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 3965–3974.
- T. Nishino, Y. Miura, T. Taniguchi, T. Ohkuma, Y. Suzuki, S. Kido, and N. Tomiyama, “Factual Accuracy is not Enough: Planning Consistent Description Order for Radiology Report Generation,” in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates, Dec. 2022, pp. 7123–7138.
- H. Qin and Y. Song, “Reinforced Cross-modal Alignment for Radiology Report Generation,” in Findings of the Association for Computational Linguistics: ACL 2022, S. Muresan, P. Nakov, and A. Villavicencio, Eds., Dublin, Ireland, May 2022, pp. 448–458.
- K. Kale, P. Bhattacharyya, and K. Jadhav, “Replace and Report: NLP Assisted Radiology Report Generation,” in Findings of the Association for Computational Linguistics: ACL 2023, A. Rogers, J. Boyd-Graber, and N. Okazaki, Eds., Toronto, Canada, Jul. 2023, pp. 10 731–10 742.
- T. Tanida, P. Müller, G. Kaissis, and D. Rueckert, “Interactive and Explainable Region-guided Radiology Report Generation,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023, 2023, pp. 7433–7442.
- M. Li, R. Liu, F. Wang, X. Chang, and X. Liang, “Auxiliary Signal-guided Knowledge Encoder-decoder for Medical Report Generation,” World Wide Web, pp. 1–18, 2022.
- K. Kale, P. Bhattacharyya, M. Gune, A. Shetty, and R. Lawyer, “KGVL-BART: Knowledge Graph Augmented Visual Language BART for Radiology Report Generation,” in Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, A. Vlachos and I. Augenstein, Eds., Dubrovnik, Croatia, May 2023, pp. 3401–3411.
- W. Hou, K. Xu, Y. Cheng, W. Li, and J. Liu, “ORGAN: Observation-Guided Radiology Report Generation via Tree Reasoning,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), A. Rogers, J. Boyd-Graber, and N. Okazaki, Eds., Toronto, Canada, Jul. 2023, pp. 8108–8122.
- A. Yan, Z. He, X. Lu, J. Du, E. Chang, A. Gentili, J. McAuley, and C.-N. Hsu, “Weakly Supervised Contrastive Learning for Chest X-Ray Report Generation,” in Findings of the Association for Computational Linguistics: EMNLP 2021, M.-F. Moens, X. Huang, L. Specia, and S. W.-t. Yih, Eds., Punta Cana, Dominican Republic, Nov. 2021, pp. 4009–4015.
- Z. Wang, L. Zhou, L. Wang, and X. Li, “A Self-boosting Framework for Automated Radiographic Report Generation,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 2433–2442.
- F. Liu, C. Yin, X. Wu, S. Ge, P. Zhang, and X. Sun, “Contrastive Attention for Automatic Chest X-ray Report Generation,” in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Online, Aug. 2021, pp. 269–280.
- Y. Miura, Y. Zhang, E. Tsai, C. Langlotz, and D. Jurafsky, “Improving Factual Completeness and Consistency of Image-to-text Radiology Report Generation,” in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, Jun. 2021, pp. 5288–5304.
- J. Mao, W. Xu, Y. Yang, J. Wang, and A. L. Yuille, “Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN),” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Y. Bengio and Y. LeCun, Eds., 2015, pp. 1–17.
- S. J. Rennie, E. Marcheret, Y. Mroueh, J. Ross, and V. Goel, “Self-critical Sequence Training for Image Captioning,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017. IEEE Computer Society, 2017, pp. 1179–1195.
- J. Lu, C. Xiong, D. Parikh, and R. Socher, “Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017. IEEE Computer Society, 2017, pp. 3242–3250.
- P. Anderson, X. He, C. Buehler, D. Teney, M. Johnson, S. Gould, and L. Zhang, “Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering,” in CVPR, 2018, pp. 6077–6086.
- M. Cornia, M. Stefanini, L. Baraldi, and R. Cucchiara, “Meshed-memory Transformer for Image Captioning,” in CVPR, 2020, pp. 10 578–10 587.
- Y. Wang, K. Wang, X. Liu, T. Gao, J. Zhang, and G. Wang, “Self Adaptive Global-Local Feature Enhancement for Radiology Report Generation,” 2023 IEEE International Conference on Image Processing (ICIP), pp. 2275–2279, 2022.
- Y. Xue, T. Xu, L. Rodney Long, Z. Xue, S. Antani, G. R. Thoma, and X. Huang, “Multimodal Recurrent Model with Attention for Automated Radiology Report Generation,” in Medical Image Computing and Computer Assisted Intervention–MICCAI 2018: 21st International Conference, Granada, Spain, September 16-20, 2018, Proceedings, Part I. Springer, 2018, pp. 457–466.
- J. Yuan, H. Liao, R. Luo, and J. Luo, “Automatic Radiology Report Generation based on Multi-view Image Fusion and Medical Concept Enrichment,” ArXiv, vol. abs/1907.09085, 2019.
- P. Harzig, Y. Chen, F. Chen, and R. Lienhart, “Addressing Data Bias Problems for Chest X-ray Image Report Generation,” in 30th British Machine Vision Conference 2019, BMVC 2019, Cardiff, UK, September 9-12, 2019, 2019, p. 144.
- F. Liu, C. You, X. Wu, S. Ge, X. Sun et al., “Auto-encoding Knowledge Graph for Unsupervised Medical Report Generation,” Advances in Neural Information Processing Systems, vol. 34, pp. 16 266–16 279, 2021.
- F. Dalla Serra, W. Clackett, H. MacKinnon, C. Wang, F. Deligianni, J. Dalton, and A. Q. O’Neil, “Multimodal Generation of Radiology Reports using Knowledge-Grounded Extraction of Entities and Relations,” in Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online only, Nov. 2022, pp. 615–624.
- S. Yan, W. K. Cheung, W. H. K. Chiu, T. M. Tong, K. C. Cheung, and S. See, “Attributed Abnormality Graph Embedding for Clinically Accurate X-Ray Report Generation,” IEEE Trans. Medical Imaging, vol. 42, no. 8, pp. 2211–2222, 2023.
- M. Li, B. Lin, Z. Chen, H. Lin, X. Liang, and X. Chang, “Dynamic Graph Enhanced Contrastive Learning for Chest X-Ray Report Generation,” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Los Alamitos, CA, USA, jun 2023, pp. 3334–3343.
- K. Zhang, H. Jiang, J. Zhang, Q. Huang, J. Fan, J. Yu, and W. Han, “Semi-supervised Medical Report Generation via Graph-guided Hybrid Feature Consistency,” IEEE Transactions on Multimedia, pp. 1–13, 2023.
- B. Hou, G. Kaissis, R. M. Summers, and B. Kainz, “RATCHET: Medical Transformer for Chest X-ray Diagnosis and Reporting,” in Medical Image Computing and Computer Assisted Intervention - MICCAI 2021 - 24th International Conference, Strasbourg, France, September 27 - October 1, 2021, Proceedings, Part VII, ser. Lecture Notes in Computer Science, M. de Bruijne, P. C. Cattin, S. Cotin, N. Padoy, S. Speidel, Y. Zheng, and C. Essert, Eds., vol. 12907. Springer, 2021, pp. 293–303.
- D. You, F. Liu, S. Ge, X. Xie, J. Zhang, and X. Wu, “AlignTransformer: Hierarchical Alignment of Visual Regions and Disease Tags for Medical Report Generation,” in Medical Image Computing and Computer Assisted Intervention - MICCAI 2021 - 24th International Conference, Strasbourg, France, September 27 - October 1, 2021, Proceedings, Part III, ser. Lecture Notes in Computer Science, vol. 12903, 2021, pp. 72–82.
- H. Yu and Q. Zhang, “Clinically Coherent Radiology Report Generation with Imbalanced Chest X-rays,” in 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2022, pp. 1781–1786.
- B. Yan, M. Pei, M. Zhao, C. Shan, and Z. Tian, “Prior Guided Transformer for Accurate Radiology Reports Generation,” IEEE Journal of Biomedical and Health Informatics, vol. 26, no. 11, pp. 5631–5640, 2022.
- L. Wang, M. Ning, D. Lu, D. Wei, Y. Zheng, and J. Chen, “An Inclusive Task-Aware Framework for Radiology Report Generation,” in Medical Image Computing and Computer Assisted Intervention - MICCAI 2022 - 25th International Conference, Singapore, September 18-22, 2022, Proceedings, Part VIII, ser. Lecture Notes in Computer Science, vol. 13438, 2022, pp. 568–577.
- Z. Wang, M. Tang, L. Wang, X. Li, and L. Zhou, “A Medical Semantic-Assisted Transformer for Radiographic Report Generation,” in Medical Image Computing and Computer Assisted Intervention - MICCAI 2022 - 25th International Conference, Singapore, September 18-22, 2022, Proceedings, Part III, ser. Lecture Notes in Computer Science, L. Wang, Q. Dou, P. T. Fletcher, S. Speidel, and S. Li, Eds., vol. 13433. Springer, 2022, pp. 655–664.
- M. Kong, Z. Huang, K. Kuang, Q. Zhu, and F. Wu, “TranSQ: Transformer-Based Semantic Query for Medical Report Generation,” in Medical Image Computing and Computer Assisted Intervention - MICCAI 2022 - 25th International Conference, Singapore, September 18-22, 2022, Proceedings, Part VIII, ser. Lecture Notes in Computer Science, vol. 13438. Springer, 2022, pp. 610–620.
- A. K. Tanwani, J. K. Barral, and D. Freedman, “Repsnet: Combining vision with language for automated medical reports,” in Medical Image Computing and Computer Assisted Intervention - MICCAI 2022 - 25th International Conference, Singapore, September 18-22, 2022, Proceedings, Part V, ser. Lecture Notes in Computer Science, L. Wang, Q. Dou, P. T. Fletcher, S. Speidel, and S. Li, Eds., vol. 13435. Springer, 2022, pp. 714–724.
- Z. Wang, H. Han, L. Wang, X. Li, and L. Zhou, “Automated Radiographic Report Generation Purely on Transformer: A Multicriteria Supervised Approach,” IEEE Transactions on Medical Imaging, vol. 41, no. 10, pp. 2803–2813, 2022.
- Y. Yang, J. Yu, J. Zhang, W. Han, H. Jiang, and Q. Huang, “Joint Embedding of Deep Visual and Semantic Features for Medical Image Report Generation,” IEEE Transactions on Multimedia, vol. 25, pp. 167–178, 2023.
- A. Nicolson, J. Dowling, and B. Koopman, “Improving Chest X-ray Report Generation by Leveraging Warm Starting,” Artificial Intelligence in Medicine, vol. 144, p. 102633, 2023.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A Large-scale Hierarchical Image Database,” in 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009, pp. 248–255.
- T. Tanida, P. Müller, G. Kaissis, and D. Rueckert, “Interactive and Explainable Region-guided Radiology Report Generation,” in CVPR, 2023.
- K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Y. Bengio and Y. LeCun, Eds., 2015, pp. 1–14.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, ser. CVPR ’16, Jun. 2016, pp. 770–778.
- G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely Connected Convolutional Networks,” in CVPR, 2017, pp. 2261–2269.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is All You Need,” in Advances in neural information processing systems, 2017, pp. 5998–6008.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” in International Conference on Learning Representations, 2021, pp. 1–21.
- J. Uijlings, K. Sande, T. Gevers, and A. Smeulders, “Selective Search for Object Recognition,” International Journal of Computer Vision, vol. 104, pp. 154–171, 09 2013.
- S. Ren, K. He, R. B. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, 2017.
- J. T. Wu, N. Agu, I. Lourentzou, A. Sharma, J. A. Paguio, J. S. Yao, E. C. Dee, W. Mitchell, S. Kashyap, A. Giovannini, L. A. Celi, and M. Moradi, “Chest ImaGenome Dataset for Clinical Reasoning,” in Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual, J. Vanschoren and S. Yeung, Eds., 2021, pp. 1–14.
- S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
- J. Irvin, P. Rajpurkar, M. Ko, Y. Yu, S. Ciurea-Ilcus, C. Chute, H. Marklund, B. Haghgoo, R. L. Ball, K. S. Shpanskaya, J. Seekins, D. A. Mong, S. S. Halabi, J. K. Sandberg, R. Jones, D. B. Larson, C. P. Langlotz, B. N. Patel, M. P. Lungren, and A. Y. Ng, “CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison,” in The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019, 2019, pp. 590–597.
- S. Jain, A. Smit, S. Q. Truong, C. D. Nguyen, M.-T. Huynh, M. Jain, V. A. Young, A. Y. Ng, M. P. Lungren, and P. Rajpurkar, “VisualCheXBERT: Addressing the Discrepancy between Radiology Report Labels and Image Labels,” ser. CHIL ’21, New York, NY, USA, 2021, p. 105–115.
- C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the Limits of Transfer Learning with a Unified Text-to-text Transformer,” Journal of Machine Learning Research, vol. 21, no. 140, pp. 1–67, 2020.
- S. Jain, A. Agrawal, A. Saporta, S. Truong, D. N. D. N. Duong, T. Bui, P. Chambon, Y. Zhang, M. Lungren, A. Ng, C. Langlotz, P. Rajpurkar, and P. Rajpurkar, “RadGraph: Extracting Clinical Entities and Relations from Radiology Reports,” in Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, vol. 1, 2021, pp. 1–12.
- M. Neumann, D. King, I. Beltagy, and W. Ammar, “ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing,” in Proceedings of the 18th BioNLP Workshop and Shared Task, D. Demner-Fushman, K. B. Cohen, S. Ananiadou, and J. Tsujii, Eds., Florence, Italy, Aug. 2019, pp. 319–327.
- T. N. Kipf and M. Welling, “Semi-Supervised Classification with Graph Convolutional Networks,” in International Conference on Learning Representations, 2017, pp. 1–14.
- M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, and L. Zettlemoyer, “BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, Jul. 2020, pp. 7871–7880.
- K. Kale, P. Bhattacharyya, A. Shetty, M. Gune, K. Shrivastava, R. Lawyer, and S. Biswas, “Knowledge Graph Construction and Its Application in Automatic Radiology Report Generation from Radiologist’s Dictation,” CoRR, vol. abs/2206.06308, 2022.
- P. Qi, Y. Zhang, Y. Zhang, J. Bolton, and C. D. Manning, “Stanza: A python natural language processing toolkit for many human languages,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Online, Jul. 2020, pp. 101–108.
- A. Smit, S. Jain, P. Rajpurkar, A. Pareek, A. Ng, and M. Lungren, “Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), B. Webber, T. Cohn, Y. He, and Y. Liu, Eds., Online, Nov. 2020, pp. 1500–1519.
- F. Liu, S. Ge, and X. Wu, “Competence-based Multimodal Curriculum Learning for Medical Report Generation,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), C. Zong, F. Xia, W. Li, and R. Navigli, Eds., Online, Aug. 2021, pp. 3001–3012.
- M. Ranzato, S. Chopra, M. Auli, and W. Zaremba, “Sequence Level Training with Recurrent Neural Networks,” in 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, 2016.
- K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “BLEU: A Method for Automatic Evaluation of Machine Translation,” in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, Jul. 2002, pp. 311–318.
- S. Banerjee and A. Lavie, “METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments,” in Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, Michigan, Jun. 2005, pp. 65–72.
- C.-Y. Lin, “ROUGE: A Package for Automatic Evaluation of Summaries,” in Text Summarization Branches Out, Barcelona, Spain, Jul. 2004, pp. 74–81.
- R. J. Williams, “Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning,” Mach. Learn., vol. 8, pp. 229–256, 1992.
- R. Vedantam, C. L. Zitnick, and D. Parikh, “Cider: Consensus-based image description evaluation,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, 2015, pp. 4566–4575.
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, Jun. 2019, pp. 4171–4186.
- Y. Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum Learning,” in Proceedings of the 26th Annual International Conference on Machine Learning, ser. ICML ’09, New York, NY, USA, 2009, p. 41–48.
- F. Liu, X. Ren, Y. Liu, K. Lei, and X. Sun, “Exploring and Distilling Cross-Modal Information for Image Captioning,” in Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019, 2019, pp. 5095–5101.
- X. Zhang, G. Kumar, H. Khayrallah, K. Murray, J. Gwinnup, M. J. Martindale, P. McNamee, K. Duh, and M. Carpuat, “An Empirical Exploration of Curriculum Learning for Neural Machine Translation,” CoRR, vol. abs/1811.00739, 2018.
- F. Faghri, D. J. Fleet, J. R. Kiros, and S. Fidler, “VSE++: Improving Visual-Semantic Embeddings with Hard Negatives,” in British Machine Vision Conference 2018, BMVC 2018, Newcastle, UK, September 3-6, 2018, 2018, p. 12.
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative Adversarial Nets,” in Advances in Neural Information Processing Systems, vol. 27, 2014, pp. 1–9.
- Y. Gu, R. Tinn, H. Cheng, M. Lucas, N. Usuyama, X. Liu, T. Naumann, J. Gao, and H. Poon, “Domain-specific Language Model Pretraining for Biomedical Natural Language Processing,” ACM Trans. Comput. Heal., vol. 3, no. 1, pp. 2:1–2:23, 2022.
- C. Raffel and D. P. W. Ellis, “Feed-forward networks with attention can solve some long-term memory problems,” CoRR, vol. abs/1512.08756, 2015.
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, “Show and Tell: A Neural Image Caption Generator,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, 2015, pp. 3156–3164.
- D. Demner-Fushman, M. D. Kohli, M. B. Rosenman, S. E. Shooshan, L. Rodriguez, S. K. Antani, G. R. Thoma, and C. J. McDonald, “Preparing A Collection of Radiology Examinations for Distribution and Retrieval,” J. Am. Medical Informatics Assoc., vol. 23, no. 2, pp. 304–310, 2016.
- A. E. Johnson, T. J. Pollard, S. J. Berkowitz, N. R. Greenbaum, M. P. Lungren, C.-y. Deng, R. G. Mark, and S. Horng, “MIMIC-CXR: A De-identified Publicly Available Database of Chest Radiographs with Free-text Reports,” Scientific Data, vol. 6, 2019.
- A. E. W. Johnson, T. J. Pollard, S. J. Berkowitz, N. R. Greenbaum, M. P. Lungren, C. Deng, R. G. Mark, and S. Horng, “MIMIC-CXR: A Large Publicly Available Database of Labeled Chest Radiographs,” CoRR, vol. abs/1901.07042, 2019.
- J. Ni, C. Hsu, A. Gentili, and J. J. McAuley, “Learning Visual-Semantic Embeddings for Reporting Abnormal Findings on Chest X-rays,” in Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020, ser. Findings of ACL, T. Cohn, Y. He, and Y. Liu, Eds., vol. EMNLP 2020, 2020, pp. 1954–1960.
- S. Sukhbaatar, a. szlam, J. Weston, and R. Fergus, “End-to-end Memory Networks,” in Advances in Neural Information Processing Systems, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, Eds., vol. 28. Curran Associates, Inc., 2015, pp. 1–9.
- X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, and R. M. Summers, “ChestX-Ray8: Hospital-scale Chest X-Ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, 2017, pp. 3462–3471.
- J. Liu, J. Lian, and Y. Yu, “ChestX-Det10: Chest X-ray Dataset on Detection of Thoracic Abnormalities,” CoRR, vol. abs/2006.10550, 2020.
- X. Wang, Y. Peng, L. Lu, Z. Lu, and R. M. Summers, “TieNet: Text-image Embedding Network for Common Thorax Disease Classification and Reporting in Chest X-rays,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 9049–9058.
- Z.-M. Chen, X.-S. Wei, P. Wang, and Y. Guo, “Multi-label image recognition with graph convolutional networks,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 5172–5181.
- B. Chen, Y. Lu, and G. Lu, “Multi-label Chest X-ray Image Classification via Label Co-occurrence Learning,” in Pattern Recognition and Computer Vision: Second Chinese Conference, PRCV 2019, Xi’an, China, November 8–11, 2019, Proceedings, Part II. Berlin, Heidelberg: Springer-Verlag, 2019, p. 682–693.
- A. Smit, S. Jain, P. Rajpurkar, A. Pareek, A. Y. Ng, and M. P. Lungren, “CheXBERT: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT,” CoRR, vol. abs/2004.09167, 2020.
- Y. Zhang and R. Jiao, “How Segment Anything Model (SAM) Boost Medical Image Segmentation?” arXiv preprint arXiv:2305.03678, 2023.
- K. Zhang and D. Liu, “Customized Segment Anything Model for Medical Image Segmentation,” arXiv preprint arXiv:2304.13785, 2023.
- D. Anand, G. R. M, V. Singhal, D. D. Shanbhag, K. S. Shriram, U. Patil, C. Bhushan, K. Manickam, D. Gui, R. Mullick, A. Gopal, P. Bhatia, and T. A. Kass-Hout, “One-shot Localization and Segmentation of Medical Images with Foundation Models,” CoRR, vol. abs/2310.18642, 2023.
- A. Ranem, N. Babendererde, M. Fuchs, and A. Mukhopadhyay, “Exploring SAM Ablations for Enhancing Medical Segmentation in Radiology and Pathology,” 2023.
- S. Pandey, K.-F. Chen, and E. B. Dam, “Comprehensive Multimodal Segmentation in Medical Imaging: Combining YOLOv8 with SAM and HQ-SAM Models,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, October 2023, pp. 2592–2598.
- C. Wang, X. Chen, H. Ning, and S. Li, “SAM-OCTA: A Fine-Tuning Strategy for Applying Foundation Model to OCTA Image Segmentation Tasks,” 2023.
- P. Zhang and Y. Wang, “Segment Anything Model for Brain Tumor Segmentation,” 2023.
- B. Fazekas, J. Morano, D. Lachinov, G. Aresta, and H. Bogunović, “Adapting Segment Anything Model (SAM) For Retinal OCT,” in Ophthalmic Medical Image Analysis: 10th International Workshop, OMIA 2023, Held in Conjunction with MICCAI 2023, Vancouver, BC, Canada, October 12, 2023, Proceedings, Berlin, Heidelberg, 2023, p. 92–101.
- N. Wang, Y. Song, and F. Xia, “Coding structures and actions with the COSTA scheme in medical conversations,” in Proceedings of the BioNLP 2018 workshop, D. Demner-Fushman, K. B. Cohen, S. Ananiadou, and J. Tsujii, Eds., Melbourne, Australia, Jul. 2018, pp. 76–86.
- Y. Tian, W. Ma, F. Xia, and Y. Song, “ChiMed: A Chinese Medical Corpus for Question Answering,” in Proceedings of the 18th BioNLP Workshop and Shared Task, D. Demner-Fushman, K. B. Cohen, S. Ananiadou, and J. Tsujii, Eds., Florence, Italy, Aug. 2019, pp. 250–260.
- N. Wang, Y. Song, and F. Xia, “Studying challenges in medical conversation with structured annotation,” in Proceedings of the First Workshop on Natural Language Processing for Medical Conversations, P. Bhatia, S. Lin, R. Gangadharaiah, B. Wallace, I. Shafran, C. Shivade, N. Du, and M. Diab, Eds., Online, Jul. 2020, pp. 12–21.
- Y. Song, Y. Tian, N. Wang, and F. Xia, “Summarizing Medical Conversations via Identifying Important Utterances,” in Proceedings of the 28th International Conference on Computational Linguistics, D. Scott, N. Bel, and C. Zong, Eds., Barcelona, Spain (Online), Dec. 2020, pp. 717–729.
- K. Krishna, S. Khosla, J. Bigham, and Z. C. Lipton, “Generating SOAP notes from doctor-patient conversations using modular summarization techniques,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), C. Zong, F. Xia, W. Li, and R. Navigli, Eds., Online, Aug. 2021, pp. 4958–4972.
- G. Michalopoulos, K. Williams, G. Singh, and T. Lin, “MedicalSum: A guided clinical abstractive summarization model for generating medical reports from patient-doctor conversations,” in Findings of the Association for Computational Linguistics: EMNLP 2022, Y. Goldberg, Z. Kozareva, and Y. Zhang, Eds., Abu Dhabi, United Arab Emirates, Dec. 2022, pp. 4741–4749.
- Y. Peng, X. Wang, L. Lu, M. Bagheri, R. M. Summers, and Z. Lu, “NegBio: A High-performance Tool for Negation and Uncertainty Detection in Radiology Reports,” CoRR, vol. abs/1712.05898, 2017.
- T. Zhang*, V. Kishore*, F. Wu*, K. Q. Weinberger, and Y. Artzi, “BERTScore: Evaluating Text Generation with BERT,” in International Conference on Learning Representations, 2020, pp. 1–43.
- J. Zhao, Y. Zhang, X. He, and P. Xie, “COVID-CT-Dataset: A CT Scan Dataset about COVID-19,” arXiv preprint arXiv:2003.13865, 2020.
- A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollar, and R. Girshick, “Segment Anything,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2023, pp. 4015–4026.
- J. Ho, A. Jain, and P. Abbeel, “Denoising Diffusion Probabilistic Models,” NeurIPS, vol. 33, pp. 6840–6851, 2020.
- S. Gong, M. Li, J. Feng, Z. Wu, and L. Kong, “DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models,” in ICLR, 2023, pp. 1–20.
- J. Luo, Y. Li, Y. Pan, T. Yao, J. Feng, H. Chao, and T. Mei, “Semantic-conditional Diffusion Networks for Image Captioning,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023. IEEE, 2023, pp. 23 359–23 368.
- T. Chen, R. Zhang, and G. Hinton, “Analog Bits: Generating Discrete Data using Diffusion Models with Self-conditioning,” in ICLR, 2023, pp. 1–23.
- H. Touvron, T. Lavril, G. Izacard, X. Martinet, M. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, and G. Lample, “LLaMA: Open and Efficient Foundation Language Models,” CoRR, vol. abs/2302.13971, 2023.
- H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale, D. Bikel, L. Blecher, C. Canton-Ferrer, M. Chen, G. Cucurull, D. Esiobu, J. Fernandes, J. Fu, W. Fu, B. Fuller, C. Gao, V. Goswami, N. Goyal, A. Hartshorn, S. Hosseini, R. Hou, H. Inan, M. Kardas, V. Kerkez, M. Khabsa, I. Kloumann, A. Korenev, P. S. Koura, M. Lachaux, T. Lavril, J. Lee, D. Liskovich, Y. Lu, Y. Mao, X. Martinet, T. Mihaylov, P. Mishra, I. Molybog, Y. Nie, A. Poulton, J. Reizenstein, R. Rungta, K. Saladi, A. Schelten, R. Silva, E. M. Smith, R. Subramanian, X. E. Tan, B. Tang, R. Taylor, A. Williams, J. X. Kuan, P. Xu, Z. Yan, I. Zarov, Y. Zhang, A. Fan, M. Kambadur, S. Narang, A. Rodriguez, R. Stojnic, S. Edunov, and T. Scialom, “Llama 2: Open Foundation and Fine-Tuned Chat Models,” CoRR, vol. abs/2307.09288, 2023.
- D. Zhu, J. Chen, X. Shen, X. Li, and M. Elhoseiny, “MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models,” CoRR, vol. abs/2304.10592, 2023.
- H. Liu, C. Li, Q. Wu, and Y. J. Lee, “Visual Instruction Tuning,” CoRR, vol. abs/2304.08485, 2023.
- Q. Ye, H. Xu, G. Xu, J. Ye, M. Yan, Y. Zhou, J. Wang, A. Hu, P. Shi, Y. Shi, C. Li, Y. Xu, H. Chen, J. Tian, Q. Qi, J. Zhang, and F. Huang, “mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality,” CoRR, vol. abs/2304.14178, 2023.
- H. Xu, Q. Ye, M. Yan, Y. Shi, J. Ye, Y. Xu, C. Li, B. Bi, Q. Qian, W. Wang, G. Xu, J. Zhang, S. Huang, F. Huang, and J. Zhou, “mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video,” 2023.
- O. Thawakar, A. Shaker, S. S. Mullappilly, H. Cholakkal, R. M. Anwer, S. H. Khan, J. Laaksonen, and F. S. Khan, “XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models,” CoRR, vol. abs/2306.07971, 2023.
- J. Hessel, A. Holtzman, M. Forbes, R. Le Bras, and Y. Choi, “CLIPScore: A Reference-free Evaluation Metric for Image Captioning,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, M.-F. Moens, X. Huang, L. Specia, and S. W.-t. Yih, Eds., Online and Punta Cana, Dominican Republic, Nov. 2021, pp. 7514–7528.
- J. Pavlopoulos, V. Kougia, and I. Androutsopoulos, “A Survey on Biomedical Image Captioning,” in Proceedings of the Second Workshop on Shortcomings in Vision and Language, R. Bernardi, R. Fernandez, S. Gella, K. Kafle, C. Kanan, S. Lee, and M. Nabi, Eds., Minneapolis, Minnesota, Jun. 2019, pp. 26–36.
- M. M. A. Monshi, J. Poon, and V. Chung, “Deep Learning in Generating Radiology Reports: A Survey,” Artificial Intelligence in Medicine, vol. 106, p. 101878, 2020.
- J. Pavlopoulos, V. Kougia, I. Androutsopoulos, and D. Papamichail, “Diagnostic Captioning: A Survey,” Knowledge and Information Systems, vol. 64, pp. 1–32, 07 2022.
- Chang Liu (864 papers)
- Yuanhe Tian (15 papers)
- Yan Song (91 papers)