TiBiX: Leveraging Temporal Information for Bidirectional X-ray and Report Generation (2403.13343v1)
Abstract: With the emergence of vision LLMs in the medical imaging domain, numerous studies have focused on two dominant research activities: (1) report generation from Chest X-rays (CXR), and (2) synthetic scan generation from text or reports. Despite some research incorporating multi-view CXRs into the generative process, prior patient scans and reports have been generally disregarded. This can inadvertently lead to the leaving out of important medical information, thus affecting generation quality. To address this, we propose TiBiX: Leveraging Temporal information for Bidirectional X-ray and Report Generation. Considering previous scans, our approach facilitates bidirectional generation, primarily addressing two challenging problems: (1) generating the current image from the previous image and current report and (2) generating the current report based on both the previous and current images. Moreover, we extract and release a curated temporal benchmark dataset derived from the MIMIC-CXR dataset, which focuses on temporal data. Our comprehensive experiments and ablation studies explore the merits of incorporating prior CXRs and achieve state-of-the-art (SOTA) results on the report generation task. Furthermore, we attain on-par performance with SOTA image generation efforts, thus serving as a new baseline in longitudinal bidirectional CXR-to-report generation. The code is available at https://github.com/BioMedIA-MBZUAI/TiBiX.
- Chambon, et al.: Roentgen: vision-language foundation model for chest x-ray generation. arXiv preprint arXiv:2211.12737 (2022)
- Choromanski, et al.: Rethinking attention with performers. arXiv preprint arXiv:2009.14794 (2020)
- Cornia, et al.: Meshed-memory transformer for image captioning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 10578–10587 (2020)
- Esser, et al.: Taming transformers for high-resolution image synthesis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 12873–12883 (2021)
- Huang, et al.: Du-vlg: Unifying vision-and-language generation via dual sequence-to-sequence pre-training. arXiv preprint arXiv:2203.09052 (2022)
- Huang, et al.: Kiut: Knowledge-injected u-transformer for radiology report generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 19809–19818 (2023)
- Kim, et al.: L-verse: Bidirectional generation between image and text. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 16526–16536 (2022)
- Lanfredi, et al.: Adversarial Regression Training for Visualizing the Progression of Chronic Obstructive Pulmonary Disease with Chest X-Rays. In: MICCAI 2019 (2019)
- Vaswani, et al.: Attention is all you need 30 (2017)
- Wang, et al.: A self-boosting framework for automated radiographic report generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 2433–2442 (2021)
- Wang, et al.: A medical semantic-assisted transformer for radiographic report generation. In: MICCAI. pp. 655–664. Springer (2022)
- Weber, et al.: Cascaded latent diffusion models for high-resolution chest x-ray synthesis. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. pp. 180–191. Springer (2023)
- Santosh Sanjeev (10 papers)
- Fadillah Adamsyah Maani (7 papers)
- Arsen Abzhanov (1 paper)
- Vijay Ram Papineni (3 papers)
- Ibrahim Almakky (21 papers)
- Mohammad Yaqub (77 papers)
- Bartłomiej W. Papież (12 papers)