Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease Diagnosis (2108.04938v1)

Published 10 Aug 2021 in cs.CV, cs.AI, and cs.CL

Abstract: Vision-and-language(V&L) models take image and text as input and learn to capture the associations between them. Prior studies show that pre-trained V&L models can significantly improve the model performance for downstream tasks such as Visual Question Answering (VQA). However, V&L models are less effective when applied in the medical domain (e.g., on X-ray images and clinical notes) due to the domain gap. In this paper, we investigate the challenges of applying pre-trained V&L models in medical applications. In particular, we identify that the visual representation in general V&L models is not suitable for processing medical data. To overcome this limitation, we propose BERTHop, a transformer-based model based on PixelHop++ and VisualBERT, for better capturing the associations between the two modalities. Experiments on the OpenI dataset, a commonly used thoracic disease diagnosis benchmark, show that BERTHop achieves an average Area Under the Curve (AUC) of 98.12% which is 1.62% higher than state-of-the-art (SOTA) while it is trained on a 9 times smaller dataset.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Masoud Monajatipoor (9 papers)
  2. Mozhdeh Rouhsedaghat (9 papers)
  3. Liunian Harold Li (19 papers)
  4. Aichi Chien (3 papers)
  5. C. -C. Jay Kuo (176 papers)
  6. Fabien Scalzo (13 papers)
  7. Kai-Wei Chang (292 papers)
Citations (28)

Summary

We haven't generated a summary for this paper yet.