Uncovering Knowledge Gaps in Radiology Report Generation Models through Knowledge Graphs (2408.14397v1)

Published 26 Aug 2024 in cs.AI, cs.CL, and cs.CV

Abstract: Recent advancements in artificial intelligence have significantly improved the automatic generation of radiology reports. However, existing evaluation methods fail to reveal the models' understanding of radiological images and their capacity to achieve human-level granularity in descriptions. To bridge this gap, we introduce a system, named ReXKG, which extracts structured information from processed reports to construct a comprehensive radiology knowledge graph. We then propose three metrics to evaluate the similarity of nodes (ReXKG-NSC), distribution of edges (ReXKG-AMS), and coverage of subgraphs (ReXKG-SCS) across various knowledge graphs. We conduct an in-depth comparative analysis of AI-generated and human-written radiology reports, assessing the performance of both specialist and generalist models. Our study provides a deeper understanding of the capabilities and limitations of current AI models in radiology report generation, offering valuable insights for improving model performance and clinical applicability.

Citations (1)

View on Semantic Scholar

Summary

The paper introduces ReXKG, which constructs comprehensive radiology knowledge graphs and proposes three novel metrics to evaluate report generation models.
It demonstrates that generalist models offer broader entity coverage than specialist models while still lacking in detailed relationships and quantitative measurements.
The study underscores the clinical relevance of integrating multimodal data to enhance AI-generated reports and improve diagnostic accuracy.

Uncovering Knowledge Gaps in Radiology Report Generation Models Through Knowledge Graphs

The paper "Uncovering Knowledge Gaps in Radiology Report Generation Models Through Knowledge Graphs" by Xiaoman Zhang, Julian N. Acosta, Hong-Yu Zhou, and Pranav Rajpurkar discusses a novel approach to evaluating radiology report generation models using knowledge graphs. This approach addresses the limitations of existing evaluation metrics and provides deeper insights into the models' understanding of radiological images.

Introduction

The radiology report generation task is crucial in the medical imaging field as it provides essential information for diagnosis and treatment planning. Despite the advancements in AI, existing evaluation metrics fall short in capturing the comprehensive understanding and descriptive granularity required in radiology reports. This paper introduces ReXKG, a system to construct a comprehensive radiology knowledge graph from AI-generated and human-written reports, and proposes three novel metrics—ReXKG-NSC, ReXKG-AMS, and ReXKG-SCS—to evaluate different aspects of these knowledge graphs.

Methodology

The paper introduces ReXKG, a system designed to extract structured information from processed radiology reports to construct a comprehensive radiology knowledge graph. These graphs capture relationships between anatomical structures, pathologies, imaging findings, medical devices, and procedures, creating a rich and queryable representation of radiological knowledge.

The knowledge graph construction involves:

Information Extraction Schema: Defines entity and relation types specific to the radiology domain.
Entity and Relation Extraction: Uses annotated radiology reports to train models for named entity recognition (NER) and relation extraction.
Node Construction: Merges synonyms and ensures data consistency using the Unified Medical Language System (UMLS) and ScispaCy.
Edge Construction: Merges and filters relations to build a coherent graph structure.

Evaluation Metrics

The authors propose three metrics to evaluate knowledge graphs:

KG Node Similarity Coefficient (KG-NSC): Assesses the similarity of nodes between two graphs.
KG Adjacency Matrix Similarity (KG-AMS): Compares the distribution of edges between two graphs using the Pearson correlation coefficient.
KG Subgraph Coverage Score (KG-SCS): Measures the coverage of subgraphs in the generated graph relative to the ground truth graph.

Experiments

The experiments focus primarily on chest X-ray report analysis using datasets like CheXpert Plus and MIMIC-CXR. The authors conduct a comprehensive analysis comparing the coverage and relationships of entities in AI-generated reports with those in human-written reports. Both specialist and generalist radiology report generation models are evaluated.

Results

The paper addresses several key questions about the performance of AI models:

Coverage of Entities: Generalist models like RadFM and MedVersa demonstrated broader coverage than specialist models but still fell short in detailing medical devices and providing comprehensive descriptive granularity.
Coverage of Relationships: AI models lagged behind human-written reports in capturing relationships between entities, with MedVersa leading with nearly 80% coverage of important subgraphs.
Comprehensiveness of Concepts: AI models often provide less detailed and occasionally hallucinated descriptions. They tend to overfit specific concepts from the training data, especially in progression descriptions.
Quantitative Measurements: AI models' descriptions are influenced by the frequency of size measurements in the training data. They often fail to provide quantified measurements for many disorders.
Specialist vs. Generalist Models: Generalist models trained on multiple modalities showed significantly enhanced radiology knowledge compared to specialist models.

Implications and Future Directions

The implications of the research are twofold:

Practical: Providing a comprehensive evaluation framework helps to improve radiology report generation models, ensuring they are better aligned with clinical requirements.
Theoretical: The paper highlights the importance of integrating multimodal data to enhance the generalizability and accuracy of AI models in radiology.

Future developments in AI for radiology could focus on incorporating longitudinal patient data to overcome the challenges of hallucinated progression descriptions and improve the detail and accuracy of generated reports.

Conclusion

The paper presents a novel and comprehensive approach to evaluating radiology report generation models using knowledge graphs, offering valuable insights for improving AI model performance and clinical applicability. The paper underscores the need for broader and more diverse training data to capture the depth of radiologists' expertise accurately.

Related Papers

Tweets

https://twitter.com/pranavrajpurkar/status/1828507615854534861

https://twitter.com/OpenlifesciAI/status/1829016463400730657