Factual Serialization Enhancement: A Key Innovation for Chest X-ray Report Generation (2405.09586v2)
Abstract: A radiology report comprises presentation-style vocabulary, which ensures clarity and organization, and factual vocabulary, which provides accurate and objective descriptions based on observable findings. While manually writing these reports is time-consuming and labor-intensive, automatic report generation offers a promising alternative. A critical step in this process is to align radiographs with their corresponding reports. However, existing methods often rely on complete reports for alignment, overlooking the impact of presentation-style vocabulary. To address this issue, we propose FSE, a two-stage Factual Serialization Enhancement method. In Stage 1, we introduce factuality-guided contrastive learning for visual representation by maximizing the semantic correspondence between radiographs and corresponding factual descriptions. In Stage 2, we present evidence-driven report generation that enhances diagnostic accuracy by integrating insights from similar historical cases structured as factual serialization. Experiments on MIMIC-CXR and IU X-ray datasets across specific and general scenarios demonstrate that FSE outperforms state-of-the-art approaches in both natural language generation and clinical efficacy metrics. Ablation studies further emphasize the positive effects of factual serialization in Stage 1 and Stage 2. The code is available at https://github.com/mk-runner/FSE.
- Scibert: A pretrained language model for scientific text, in: EMNLP, Association for Computational Linguistics. pp. 3615–3620. URL: https://aclanthology.org/D19-1371, doi:10.18653/v1/D19-1371.
- Towards unifying medical vision-and-language pre-training via soft prompts, in: ICCV, IEEE. pp. 23346–23356. URL: https://doi.org/10.1109/ICCV51070.2023.02139, doi:10.1109/ICCV51070.2023.02139.
- Align, reason and learn: Enhancing medical vision-and-language pre-training with knowledge, in: ACMMM, Association for Computing Machinery. pp. 5152–5161. URL: https://doi.org/10.1145/3503161.3547948, doi:10.1145/3503161.3547948.
- Cross-modal memory networks for radiology report generation, in: ACL, Association for Computational Linguistics. pp. 5904–5914. URL: https://aclanthology.org/2021.acl-long.459, doi:10.18653/v1/2021.acl-long.459.
- Generating radiology reports via memory-driven transformer, in: EMNLP, Association for Computational Linguistics. pp. 1439–1449. URL: https://aclanthology.org/2020.emnlp-main.112, doi:10.18653/v1/2020.emnlp-main.112.
- Prior: Prototype representation joint learning from medical images and reports, in: ICCV, pp. 21361–21371. doi:10.1109/ICCV51070.2023.01953.
- Improving the factual correctness of radiology report generation with semantic rewards, in: EMNLP, Association for Computational Linguistics. pp. 4348–4360. URL: https://aclanthology.org/2022.findings-emnlp.319, doi:10.18653/v1/2022.findings-emnlp.319.
- Preparing a collection of radiology examinations for distribution and retrieval. Journal of the American Medical Informatics Association 23, 304–310. doi:10.1093/jamia/ocv080.
- Bert: Pre-training of deep bidirectional transformers for language understanding, in: NAACL, Association for Computational Linguistics. pp. 4171–4186. URL: https://aclanthology.org/N19-1423, doi:10.18653/v1/N19-1423.
- Simulating doctors’ thinking logic for chest x-ray report generation via transformer-based semantic query learning. Medical Image Analysis 91, 102982. URL: https://www.sciencedirect.com/science/article/pii/S1361841523002426, doi:https://doi.org/10.1016/j.media.2023.102982.
- Deep residual learning for image recognition, in: CVPR, pp. 770–778. URL: https://doi.org/10.1109/CVPR.2016.90, doi:10.1109/CVPR.2016.90.
- Kiut: Knowledge-injected u-transformer for radiology report generation, in: CVPR, pp. 19809–19818. doi:10.1109/CVPR52729.2023.01897.
- Chexpert: a large chest radiograph dataset with uncertainty labels and expert comparison, in: AAAI, AAAI Press. p. Article 73. URL: https://doi.org/10.1609/aaai.v33i01.3301590, doi:10.1609/aaai.v33i01.3301590.
- Radgraph: Extracting clinical entities and relations from radiology reports, in: Vanschoren, J., Yeung, S. (Eds.), NeurIPS. URL: https://datasets-benchmarks-proceedings.neurips.cc/paper_files/paper/2021/file/c8ffe9a587b126f152ed3d89a146b445-Paper-round1.pdf.
- Mimic-cxr-jpg, a large publicly available database of labeled chest radiographs. arXiv preprint arXiv:1901.07042 .
- Billion-scale similarity search with gpus. IEEE Transactions on Big Data 7, 535–547. doi:10.1109/TBDATA.2019.2921572.
- Align before fuse: Vision and language representation learning with momentum distillation, in: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (Eds.), NeurIPS, Curran Associates, Inc.. pp. 9694–9705. URL: https://proceedings.neurips.cc/paper_files/paper/2021/file/505259756244493872b7709a8a01b536-Paper.pdf.
- Dynamic graph enhanced contrastive learning for chest x-ray report generation, in: CVPR, pp. 3334–3343. doi:10.1109/CVPR52729.2023.00325.
- Towards medical artificial general intelligence via knowledge-enhanced multimodal pretraining. ArXiv abs/2304.14204.
- Effectively fine-tune to improve large multimodal models for radiology report generation, in: NeurIPSW. URL: https://openreview.net/forum?id=QAruOR4nUa.
- Uncertainty-aware report generation for chest x-rays by variational topic inference. Medical Image Analysis 82, 102603. URL: https://www.sciencedirect.com/science/article/pii/S1361841522002341, doi:https://doi.org/10.1016/j.media.2022.102603.
- Improving chest x-ray report generation by leveraging warm starting. Artificial Intelligence in Medicine 144, 102633. URL: https://www.sciencedirect.com/science/article/pii/S0933365723001471, doi:https://doi.org/10.1016/j.artmed.2023.102633.
- Representation learning with contrastive predictive coding. ArXiv abs/1807.03748.
- Self-supervised multi-modal training from uncurated images and reports enables monitoring ai in radiology. Medical Image Analysis 91, 103021. URL: https://www.sciencedirect.com/science/article/pii/S1361841523002815, doi:https://doi.org/10.1016/j.media.2023.103021.
- Learning transferable visual models from natural language supervision, in: ICML, PMLR. pp. 8748–8763. URL: http://proceedings.mlr.press/v139/radford21a.html.
- Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. ArXiv abs/1910.01108.
- Combining automatic labelers and expert annotations for accurate radiology report labeling using BERT, in: Webber, B., Cohn, T., He, Y., Liu, Y. (Eds.), EMNLP, Association for Computational Linguistics. pp. 1500–1519. URL: https://aclanthology.org/2020.emnlp-main.117, doi:10.18653/v1/2020.emnlp-main.117.
- Cross-modal contrastive attention model for medical report generation, in: COLING, International Committee on Computational Linguistics. pp. 2388–2397. URL: https://aclanthology.org/2022.coling-1.210.
- Interactive and explainable region-guided radiology report generation, in: CVPR, pp. 7433–7442. doi:10.1109/CVPR52729.2023.00718.
- Multi-granularity cross-modal alignment for generalized medical visual representation learning, in: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (Eds.), NeurIPS, pp. 33536–33549. URL: http://papers.nips.cc/paper_files/paper/2022/hash/d925bda407ada0df3190df323a212661-Abstract-Conference.html.
- Cross-modal prototype driven network for radiology report generation, in: Avidan, S., Brostow, G.J., Cissé, M., Farinella, G.M., Hassner, T. (Eds.), ECCV, Springer Nature Switzerland. pp. 563–579. URL: https://doi.org/10.1007/978-3-031-19833-5_33, doi:10.1007/978-3-031-19833-5\_33.
- Rethinking medical report generation: Disease revealing enhancement with knowledge graph, in: ICMLW. URL: https://openreview.net/forum?id=PkQjnInDkR.
- Metransformer: Radiology report generation by transformer with multiple learnable expert tokens, in: CVPR, pp. 11558–11567. doi:10.1109/CVPR52729.2023.01112.
- Medklip: Medical knowledge enhanced language-image pre-training. medRxiv URL: https://www.medrxiv.org/content/early/2023/01/11/2023.01.10.23284412, doi:10.1101/2023.01.10.23284412.
- Style-aware radiology report generation with radgraph and few-shot prompting, in: EMNLP, Association for Computational Linguistics, Singapore. pp. 14676–14688. URL: https://aclanthology.org/2023.findings-emnlp.977, doi:10.18653/v1/2023.findings-emnlp.977.
- Radiology report generation with a learned knowledge base and multi-modal alignment. Medical Image Analysis 86, 102798. URL: https://www.sciencedirect.com/science/article/pii/S1361841523000592, doi:https://doi.org/10.1016/j.media.2023.102798.
- Knowledge matters: Chest radiology report generation with general and specific knowledge. Medical Image Analysis 80, 102510. URL: https://www.sciencedirect.com/science/article/pii/S1361841522001578, doi:https://doi.org/10.1016/j.media.2022.102510.
- Evaluating progress in automatic chest x-ray radiology report generation. Patterns 4, 100802. URL: https://doi.org/10.1016/j.patter.2023.100802, doi:10.1016/J.PATTER.2023.100802.
- Knowledge-enhanced visual-language pre-training on chest radiology images. Nature Communications 14, 4542. URL: https://doi.org/10.1038/s41467-023-40260-7, doi:10.1038/s41467-023-40260-7.
- When radiology report generation meets knowledge graph, in: AAAI, AAAI Press. pp. 12910–12917. URL: https://doi.org/10.1609/aaai.v34i07.6989, doi:10.1609/AAAI.V34I07.6989.
- Tandemnet: Distilling knowledge from medical images using diagnostic reports as optional semantic references, in: MICCAI, Springer-Verlag. p. 320–328. URL: https://doi.org/10.1007/978-3-319-66179-7_37, doi:10.1007/978-3-319-66179-7_37.
- Text-guided neural network training for image recognition in natural scenes and medicine. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 1733–1745. doi:10.1109/TPAMI.2019.2955476.
- Mdnet: A semantically and visually interpretable medical image diagnosis network, in: CVPR, pp. 3549–3557. doi:10.1109/CVPR.2017.378.
- Kang Liu (207 papers)
- Zhuoqi Ma (8 papers)
- Mengmeng Liu (21 papers)
- Zhicheng Jiao (25 papers)
- Xiaolu Kang (4 papers)
- Qiguang Miao (21 papers)
- Kun Xie (21 papers)