Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Unsupervised Multimodal Representation Learning across Medical Images and Reports (1811.08615v1)

Published 21 Nov 2018 in cs.LG and cs.CL

Abstract: Joint embeddings between medical imaging modalities and associated radiology reports have the potential to offer significant benefits to the clinical community, ranging from cross-domain retrieval to conditional generation of reports to the broader goals of multimodal representation learning. In this work, we establish baseline joint embedding results measured via both local and global retrieval methods on the soon to be released MIMIC-CXR dataset consisting of both chest X-ray images and the associated radiology reports. We examine both supervised and unsupervised methods on this task and show that for document retrieval tasks with the learned representations, only a limited amount of supervision is needed to yield results comparable to those of fully-supervised methods.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Tzu-Ming Harry Hsu (6 papers)
  2. Wei-Hung Weng (35 papers)
  3. Willie Boag (9 papers)
  4. Peter Szolovits (44 papers)
  5. Matthew Mcdermott (19 papers)
Citations (34)