VisualCheXbert: Addressing the Discrepancy Between Radiology Report Labels and Image Labels (2102.11467v2)

Published 23 Feb 2021 in eess.IV, cs.CV, and cs.LG

Abstract: Automatic extraction of medical conditions from free-text radiology reports is critical for supervising computer vision models to interpret medical images. In this work, we show that radiologists labeling reports significantly disagree with radiologists labeling corresponding chest X-ray images, which reduces the quality of report labels as proxies for image labels. We develop and evaluate methods to produce labels from radiology reports that have better agreement with radiologists labeling images. Our best performing method, called VisualCheXbert, uses a biomedically-pretrained BERT model to directly map from a radiology report to the image labels, with a supervisory signal determined by a computer vision model trained to detect medical conditions from chest X-ray images. We find that VisualCheXbert outperforms an approach using an existing radiology report labeler by an average F1 score of 0.14 (95% CI 0.12, 0.17). We also find that VisualCheXbert better agrees with radiologists labeling chest X-ray images than do radiologists labeling the corresponding radiology reports by an average F1 score across several medical conditions of between 0.12 (95% CI 0.09, 0.15) and 0.21 (95% CI 0.18, 0.24).

Citations (27)

View on Semantic Scholar

Summary

The paper introduces VisualCheXbert, a model that maps radiology report texts directly to X-ray image labels, improving average F1 scores by 0.14.
It employs a biomedically-pretrained BERT and DenseNet supervisory signal to address label discrepancies, with Cohen’s Kappa values ranging from 0.312 to 0.430.
The study’s findings indicate that accurately aligning text and image labels can enhance AI diagnostic models in radiology and standardize labeling practices.

An Expert Analysis of "VisualCheXbert: Addressing the Discrepancy Between Radiology Report Labels and Image Labels"

This paper presents an in-depth examination of the differences between radiology report labels and image labels, focusing on chest X-ray datasets. It introduces and evaluates methods to align these labels more closely, with the ultimate aim of improving automated medical imaging models.

Discrepancy Between Radiology Reports and Image Labels

The paper identifies a significant divergence between radiologists' labels from reports and those from directly analyzing X-ray images. This discrepancy manifests with low agreement metrics, such as an average Cohen's Kappa between 0.312 and 0.430. This situation highlights the potential inconsistencies when using report labels to train computer vision models intended for medical image analysis.

The paper suggests several factors causing this divergence. These include the hierarchical nature of medical conditions, differences in information availability to radiologists (such as access to historical data), and inherent noise and subjectivity in the labeling task. Identifying these issues is pivotal because it questions the assumptions underlying the quality of radiology report labels as proxies for image labels.

Mapping Textual Reports Directly to Image Labels

The paper's primary contribution is the development of VisualCheXbert, a model leveraging biomedically-pretrained BERT architecture. This model aims to map radiology report texts directly to X-ray image labels, leveraging the supervisory signal from a DenseNet-based computer vision model, achieving better alignment with actual radiologist image labels.

VisualCheXbert improves upon baseline metrics significantly; it yields an increase in average F1 score by 0.14 compared to existing report-based labeling methods. These findings suggest that the use of advanced natural language processing techniques can bridge the gap between textual and image-derived medical condition labels.

Evaluation and Data Insights

The evaluations took place on two substantial chest X-ray datasets: CheXpert and MIMIC-CXR, each presenting their own imaging conditions and radiologist labeling biases. The model achieved plausible results across various conditions, with improvement in mapping specific medical condition reports to image labels. Importantly, conditions requiring nuanced interpretation like Cardiomegaly and Lung Opacity saw marked improvement in label congruency, affirming VisualCheXbert's capability in handling complex medical case scenarios.

Implications and Future Directions

Practically, the capability to more accurately map reports to image labels using VisualCheXbert could enhance the accuracy and reliability of computer vision models in clinical settings. Theoretically, this approach may standardize labeling practices across institutions, providing a more unified methodological framework for training AI models in radiology.

Future research may focus on refining the model's architecture and pre-training regimes, potentially incorporating multimodal data to further improve the label alignment. Moreover, the exploration of diverse medical imaging modalities beyond chest X-rays could be beneficial to validate VisualCheXbert's applicability across broader medical contexts.

In summary, this paper contributes substantially to the understanding of label discrepancies in radiology and advocates for advanced NLP methodologies in addressing these challenges.

PDF Markdown

Related Papers

YouTube

Show All Videos