Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Factual Serialization Enhancement: A Key Innovation for Chest X-ray Report Generation (2405.09586v2)

Published 15 May 2024 in eess.IV, cs.AI, and cs.CV

Abstract: A radiology report comprises presentation-style vocabulary, which ensures clarity and organization, and factual vocabulary, which provides accurate and objective descriptions based on observable findings. While manually writing these reports is time-consuming and labor-intensive, automatic report generation offers a promising alternative. A critical step in this process is to align radiographs with their corresponding reports. However, existing methods often rely on complete reports for alignment, overlooking the impact of presentation-style vocabulary. To address this issue, we propose FSE, a two-stage Factual Serialization Enhancement method. In Stage 1, we introduce factuality-guided contrastive learning for visual representation by maximizing the semantic correspondence between radiographs and corresponding factual descriptions. In Stage 2, we present evidence-driven report generation that enhances diagnostic accuracy by integrating insights from similar historical cases structured as factual serialization. Experiments on MIMIC-CXR and IU X-ray datasets across specific and general scenarios demonstrate that FSE outperforms state-of-the-art approaches in both natural language generation and clinical efficacy metrics. Ablation studies further emphasize the positive effects of factual serialization in Stage 1 and Stage 2. The code is available at https://github.com/mk-runner/FSE.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Scibert: A pretrained language model for scientific text, in: EMNLP, Association for Computational Linguistics. pp. 3615–3620. URL: https://aclanthology.org/D19-1371, doi:10.18653/v1/D19-1371.
  2. Towards unifying medical vision-and-language pre-training via soft prompts, in: ICCV, IEEE. pp. 23346–23356. URL: https://doi.org/10.1109/ICCV51070.2023.02139, doi:10.1109/ICCV51070.2023.02139.
  3. Align, reason and learn: Enhancing medical vision-and-language pre-training with knowledge, in: ACMMM, Association for Computing Machinery. pp. 5152–5161. URL: https://doi.org/10.1145/3503161.3547948, doi:10.1145/3503161.3547948.
  4. Cross-modal memory networks for radiology report generation, in: ACL, Association for Computational Linguistics. pp. 5904–5914. URL: https://aclanthology.org/2021.acl-long.459, doi:10.18653/v1/2021.acl-long.459.
  5. Generating radiology reports via memory-driven transformer, in: EMNLP, Association for Computational Linguistics. pp. 1439–1449. URL: https://aclanthology.org/2020.emnlp-main.112, doi:10.18653/v1/2020.emnlp-main.112.
  6. Prior: Prototype representation joint learning from medical images and reports, in: ICCV, pp. 21361–21371. doi:10.1109/ICCV51070.2023.01953.
  7. Improving the factual correctness of radiology report generation with semantic rewards, in: EMNLP, Association for Computational Linguistics. pp. 4348–4360. URL: https://aclanthology.org/2022.findings-emnlp.319, doi:10.18653/v1/2022.findings-emnlp.319.
  8. Preparing a collection of radiology examinations for distribution and retrieval. Journal of the American Medical Informatics Association 23, 304–310. doi:10.1093/jamia/ocv080.
  9. Bert: Pre-training of deep bidirectional transformers for language understanding, in: NAACL, Association for Computational Linguistics. pp. 4171–4186. URL: https://aclanthology.org/N19-1423, doi:10.18653/v1/N19-1423.
  10. Simulating doctors’ thinking logic for chest x-ray report generation via transformer-based semantic query learning. Medical Image Analysis 91, 102982. URL: https://www.sciencedirect.com/science/article/pii/S1361841523002426, doi:https://doi.org/10.1016/j.media.2023.102982.
  11. Deep residual learning for image recognition, in: CVPR, pp. 770–778. URL: https://doi.org/10.1109/CVPR.2016.90, doi:10.1109/CVPR.2016.90.
  12. Kiut: Knowledge-injected u-transformer for radiology report generation, in: CVPR, pp. 19809–19818. doi:10.1109/CVPR52729.2023.01897.
  13. Chexpert: a large chest radiograph dataset with uncertainty labels and expert comparison, in: AAAI, AAAI Press. p. Article 73. URL: https://doi.org/10.1609/aaai.v33i01.3301590, doi:10.1609/aaai.v33i01.3301590.
  14. Radgraph: Extracting clinical entities and relations from radiology reports, in: Vanschoren, J., Yeung, S. (Eds.), NeurIPS. URL: https://datasets-benchmarks-proceedings.neurips.cc/paper_files/paper/2021/file/c8ffe9a587b126f152ed3d89a146b445-Paper-round1.pdf.
  15. Mimic-cxr-jpg, a large publicly available database of labeled chest radiographs. arXiv preprint arXiv:1901.07042 .
  16. Billion-scale similarity search with gpus. IEEE Transactions on Big Data 7, 535–547. doi:10.1109/TBDATA.2019.2921572.
  17. Align before fuse: Vision and language representation learning with momentum distillation, in: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (Eds.), NeurIPS, Curran Associates, Inc.. pp. 9694–9705. URL: https://proceedings.neurips.cc/paper_files/paper/2021/file/505259756244493872b7709a8a01b536-Paper.pdf.
  18. Dynamic graph enhanced contrastive learning for chest x-ray report generation, in: CVPR, pp. 3334–3343. doi:10.1109/CVPR52729.2023.00325.
  19. Towards medical artificial general intelligence via knowledge-enhanced multimodal pretraining. ArXiv abs/2304.14204.
  20. Effectively fine-tune to improve large multimodal models for radiology report generation, in: NeurIPSW. URL: https://openreview.net/forum?id=QAruOR4nUa.
  21. Uncertainty-aware report generation for chest x-rays by variational topic inference. Medical Image Analysis 82, 102603. URL: https://www.sciencedirect.com/science/article/pii/S1361841522002341, doi:https://doi.org/10.1016/j.media.2022.102603.
  22. Improving chest x-ray report generation by leveraging warm starting. Artificial Intelligence in Medicine 144, 102633. URL: https://www.sciencedirect.com/science/article/pii/S0933365723001471, doi:https://doi.org/10.1016/j.artmed.2023.102633.
  23. Representation learning with contrastive predictive coding. ArXiv abs/1807.03748.
  24. Self-supervised multi-modal training from uncurated images and reports enables monitoring ai in radiology. Medical Image Analysis 91, 103021. URL: https://www.sciencedirect.com/science/article/pii/S1361841523002815, doi:https://doi.org/10.1016/j.media.2023.103021.
  25. Learning transferable visual models from natural language supervision, in: ICML, PMLR. pp. 8748–8763. URL: http://proceedings.mlr.press/v139/radford21a.html.
  26. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. ArXiv abs/1910.01108.
  27. Combining automatic labelers and expert annotations for accurate radiology report labeling using BERT, in: Webber, B., Cohn, T., He, Y., Liu, Y. (Eds.), EMNLP, Association for Computational Linguistics. pp. 1500–1519. URL: https://aclanthology.org/2020.emnlp-main.117, doi:10.18653/v1/2020.emnlp-main.117.
  28. Cross-modal contrastive attention model for medical report generation, in: COLING, International Committee on Computational Linguistics. pp. 2388–2397. URL: https://aclanthology.org/2022.coling-1.210.
  29. Interactive and explainable region-guided radiology report generation, in: CVPR, pp. 7433–7442. doi:10.1109/CVPR52729.2023.00718.
  30. Multi-granularity cross-modal alignment for generalized medical visual representation learning, in: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (Eds.), NeurIPS, pp. 33536–33549. URL: http://papers.nips.cc/paper_files/paper/2022/hash/d925bda407ada0df3190df323a212661-Abstract-Conference.html.
  31. Cross-modal prototype driven network for radiology report generation, in: Avidan, S., Brostow, G.J., Cissé, M., Farinella, G.M., Hassner, T. (Eds.), ECCV, Springer Nature Switzerland. pp. 563–579. URL: https://doi.org/10.1007/978-3-031-19833-5_33, doi:10.1007/978-3-031-19833-5\_33.
  32. Rethinking medical report generation: Disease revealing enhancement with knowledge graph, in: ICMLW. URL: https://openreview.net/forum?id=PkQjnInDkR.
  33. Metransformer: Radiology report generation by transformer with multiple learnable expert tokens, in: CVPR, pp. 11558–11567. doi:10.1109/CVPR52729.2023.01112.
  34. Medklip: Medical knowledge enhanced language-image pre-training. medRxiv URL: https://www.medrxiv.org/content/early/2023/01/11/2023.01.10.23284412, doi:10.1101/2023.01.10.23284412.
  35. Style-aware radiology report generation with radgraph and few-shot prompting, in: EMNLP, Association for Computational Linguistics, Singapore. pp. 14676–14688. URL: https://aclanthology.org/2023.findings-emnlp.977, doi:10.18653/v1/2023.findings-emnlp.977.
  36. Radiology report generation with a learned knowledge base and multi-modal alignment. Medical Image Analysis 86, 102798. URL: https://www.sciencedirect.com/science/article/pii/S1361841523000592, doi:https://doi.org/10.1016/j.media.2023.102798.
  37. Knowledge matters: Chest radiology report generation with general and specific knowledge. Medical Image Analysis 80, 102510. URL: https://www.sciencedirect.com/science/article/pii/S1361841522001578, doi:https://doi.org/10.1016/j.media.2022.102510.
  38. Evaluating progress in automatic chest x-ray radiology report generation. Patterns 4, 100802. URL: https://doi.org/10.1016/j.patter.2023.100802, doi:10.1016/J.PATTER.2023.100802.
  39. Knowledge-enhanced visual-language pre-training on chest radiology images. Nature Communications 14, 4542. URL: https://doi.org/10.1038/s41467-023-40260-7, doi:10.1038/s41467-023-40260-7.
  40. When radiology report generation meets knowledge graph, in: AAAI, AAAI Press. pp. 12910–12917. URL: https://doi.org/10.1609/aaai.v34i07.6989, doi:10.1609/AAAI.V34I07.6989.
  41. Tandemnet: Distilling knowledge from medical images using diagnostic reports as optional semantic references, in: MICCAI, Springer-Verlag. p. 320–328. URL: https://doi.org/10.1007/978-3-319-66179-7_37, doi:10.1007/978-3-319-66179-7_37.
  42. Text-guided neural network training for image recognition in natural scenes and medicine. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 1733–1745. doi:10.1109/TPAMI.2019.2955476.
  43. Mdnet: A semantically and visually interpretable medical image diagnosis network, in: CVPR, pp. 3549–3557. doi:10.1109/CVPR.2017.378.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Kang Liu (207 papers)
  2. Zhuoqi Ma (8 papers)
  3. Mengmeng Liu (21 papers)
  4. Zhicheng Jiao (25 papers)
  5. Xiaolu Kang (4 papers)
  6. Qiguang Miao (21 papers)
  7. Kun Xie (21 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com