Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MedCycle: Unpaired Medical Report Generation via Cycle-Consistency (2403.13444v2)

Published 20 Mar 2024 in cs.CV

Abstract: Generating medical reports for X-ray images presents a significant challenge, particularly in unpaired scenarios where access to paired image-report data for training is unavailable. Previous works have typically learned a joint embedding space for images and reports, necessitating a specific labeling schema for both. We introduce an innovative approach that eliminates the need for consistent labeling schemas, thereby enhancing data accessibility and enabling the use of incompatible datasets. This approach is based on cycle-consistent mapping functions that transform image embeddings into report embeddings, coupled with report auto-encoding for medical report generation. Our model and objectives consider intricate local details and the overarching semantic context within images and reports. This approach facilitates the learning of effective mapping functions, resulting in the generation of coherent reports. It outperforms state-of-the-art results in unpaired chest X-ray report generation, demonstrating improvements in both language and clinical metrics.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. Unsupervised neural machine translation. In International Conference on Learning Representations.
  2. Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization.
  3. A neural probabilistic language model. Advances in neural information processing systems, 13.
  4. Padchest: A large chest x-ray image dataset with multi-label annotated reports. Medical image analysis, 66:101797.
  5. Cross-modal memory networks for radiology report generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Association for Computational Linguistics.
  6. Generating radiology reports via memory-driven transformer. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics.
  7. Preparing a collection of radiology examinations for distribution and retrieval. Journal of the American Medical Informatics Association, 23(2):304–310.
  8. Unsupervised image captioning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4125–4134.
  9. Domain-adversarial training of neural networks. The journal of machine learning research, 17(1):2096–2030.
  10. Unpaired image captioning via scene graph alignments. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10323–10332.
  11. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778.
  12. Deep compositional captioning: Describing novel object categories without paired training data. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–10.
  13. Kiut: Knowledge-injected u-transformer for radiology report generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19809–19818.
  14. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 590–597.
  15. On the automatic generation of medical imaging reports. arXiv preprint arXiv:1711.08195.
  16. Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data, 6(1):317.
  17. Towards unsupervised image captioning with shared multimodal embeddings. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7414–7424.
  18. Unsupervised machine translation using monolingual corpora only. In International Conference on Learning Representations.
  19. Phrase-based & neural unsupervised machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 5039–5049. Association for Computational Linguistics.
  20. Knowledge-driven encode, retrieve, paraphrase for medical image report generation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 6666–6673.
  21. Unify, align and refine: Multi-level semantic alignment for radiology report generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2863–2874.
  22. Hybrid retrieval-generation reinforced agent for medical image report generation. Advances in neural information processing systems, 31.
  23. Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out. Association for Computational Linguistics.
  24. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer.
  25. A structured self-attentive sentence embedding. 5th International Conference on Learning Representations, ICLR.
  26. Exploring semantic relationships for unpaired image captioning. arXiv preprint arXiv:2106.10658.
  27. Exploring and distilling posterior and prior knowledge for radiology report generation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13753–13762.
  28. Auto-encoding knowledge graph for unsupervised medical report generation. Advances in Neural Information Processing Systems, 34:16266–16279.
  29. Clinically accurate chest x-ray report generation. In Machine Learning for Healthcare Conference, pages 249–269. PMLR.
  30. Object-centric unsupervised image captioning. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXVI, pages 219–235. Springer.
  31. Translatotron 3: Speech to speech translation with monolingual data. arXiv preprint arXiv:2305.17547.
  32. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics.
  33. Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2556–2565.
  34. Yfcc100m: The new data in multimedia research. Communications of the ACM, 59(2):64–73.
  35. Attention is all you need. Advances in neural information processing systems, 30.
  36. Captioning images with diverse objects. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5753–5761.
  37. Cross-modal prototype driven network for radiology report generation. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXV, pages 563–579. Springer.
  38. A medical semantic-assisted transformer for radiographic report generation. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2022, pages 655–664. Springer.
  39. Extract and edit: An alternative to back-translation for unsupervised neural machine translation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1173–1183. Association for Computational Linguistics.
  40. When radiology report generation meets knowledge graph. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 12910–12917.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Elad Hirsch (7 papers)
  2. Gefen Dawidowicz (4 papers)
  3. Ayellet Tal (23 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.