Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ConTEXTual Net: A Multimodal Vision-Language Model for Segmentation of Pneumothorax (2303.01615v2)

Published 2 Mar 2023 in cs.CV

Abstract: Radiology narrative reports often describe characteristics of a patient's disease, including its location, size, and shape. Motivated by the recent success of multimodal learning, we hypothesized that this descriptive text could guide medical image analysis algorithms. We proposed a novel vision-LLM, ConTEXTual Net, for the task of pneumothorax segmentation on chest radiographs. ConTEXTual Net utilizes language features extracted from corresponding free-form radiology reports using a pre-trained LLM. Cross-attention modules are designed to combine the intermediate output of each vision encoder layer and the text embeddings generated by the LLM. ConTEXTual Net was trained on the CANDID-PTX dataset consisting of 3,196 positive cases of pneumothorax with segmentation annotations from 6 different physicians as well as clinical radiology reports. Using cross-validation, ConTEXTual Net achieved a Dice score of 0.716$\pm$0.016, which was similar to the degree of inter-reader variability (0.712$\pm$0.044) computed on a subset of the data. It outperformed both vision-only models (ResNet50 U-Net: 0.677$\pm$0.015 and GLoRIA: 0.686$\pm$0.014) and a competing vision-LLM (LAVT: 0.706$\pm$0.009). Ablation studies confirmed that it was the text information that led to the performance gains. Additionally, we show that certain augmentation methods degraded ConTEXTual Net's segmentation performance by breaking the image-text concordance. We also evaluated the effects of using different LLMs and activation functions in the cross-attention module, highlighting the efficacy of our chosen architectural design.

Citations (8)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com