Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bidirectional Captioning for Clinically Accurate and Interpretable Models (2310.19635v1)

Published 30 Oct 2023 in cs.CV

Abstract: Vision-language pretraining has been shown to produce high-quality visual encoders which transfer efficiently to downstream computer vision tasks. While generative LLMs have gained widespread attention, image captioning has thus far been mostly overlooked as a form of cross-modal pretraining in favor of contrastive learning, especially in medical image analysis. In this paper, we experiment with bidirectional captioning of radiology reports as a form of pretraining and compare the quality and utility of learned embeddings with those from contrastive pretraining methods. We optimize a CNN encoder, transformer decoder architecture named RadTex for the radiology domain. Results show that not only does captioning pretraining yield visual encoders that are competitive with contrastive pretraining (CheXpert competition multi-label AUC of 89.4%), but also that our transformer decoder is capable of generating clinically relevant reports (captioning macro-F1 score of 0.349 using CheXpert labeler) and responding to prompts with targeted, interactive outputs.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Keegan Quigley (6 papers)
  2. Miriam Cha (13 papers)
  3. Josh Barua (2 papers)
  4. Geeticka Chauhan (9 papers)
  5. Seth Berkowitz (8 papers)
  6. Steven Horng (17 papers)
  7. Polina Golland (78 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com