Optimizing the Factual Correctness of a Summary: A Study of Summarizing Radiology Reports (1911.02541v3)

Published 6 Nov 2019 in cs.CL

Abstract: Neural abstractive summarization models are able to generate summaries which have high overlap with human references. However, existing models are not optimized for factual correctness, a critical metric in real-world applications. In this work, we develop a general framework where we evaluate the factual correctness of a generated summary by fact-checking it automatically against its reference using an information extraction module. We further propose a training strategy which optimizes a neural summarization model with a factual correctness reward via reinforcement learning. We apply the proposed method to the summarization of radiology reports, where factual correctness is a key requirement. On two separate datasets collected from hospitals, we show via both automatic and human evaluation that the proposed approach substantially improves the factual correctness and overall quality of outputs over a competitive neural summarization system, producing radiology summaries that approach the quality of human-authored ones.

PDF Abstract

Optimizing Factual Correctness in Radiology Report Summarization

The paper "Optimizing the Factual Correctness of a Summary: A Study of Summarizing Radiology Reports" addresses a significant challenge in the domain of neural abstractive summarization — ensuring the factual accuracy of generated summaries. In the context of radiology reports, factual correctness is imperative due to the potential implications of inaccuracies in clinical settings. This paper introduces a novel framework for enhancing factual accuracy by integrating a fact-checking module with reinforcement learning (RL) strategies to optimize summary generation.

Methodological Approach

The researchers propose a system where factual correctness is quantified through an information extraction (IE) module. This system verifies whether a generated summary accurately reflects the facts in its corresponding human-authored reference. The process involves:

Fact Extraction: An open-source CheXpert labeler is employed to extract key clinical facts from both reference and generated summaries. These facts are represented as vectors, encapsulating binary variables indicative of the presence or absence of specific clinical observations.
Factual Accuracy Scoring: The accuracy of the generated summaries is evaluated by comparing the extracted fact vectors against the reference vectors, accounting for both precision and completeness.
Reinforcement Learning for Training: The summarization model is trained using RL, optimizing a reward function that combines conventional ROUGE scores and the newly introduced factual correctness score. This dual-objective training aims to produce summaries that maintain both high textual overlap with human references and factual integrity.

Empirical Evaluation

The paper demonstrates the effectiveness of the proposed framework through experiments on two datasets of radiology reports sourced from Stanford University Hospital and Rhode Island Hospital. The RL-based model is compared against traditional neural summarization models and extractive baselines. Key findings include:

The RL-trained model significantly improves factual correctness, evidenced by higher factual accuracy scores across multiple clinical variables.
The joint optimization of ROUGE and factual correctness not only enhances factual integrity but also maintains competitive ROUGE metrics, indicating a balanced performance between content overlap and correctness.
A detailed evaluation by board-certified radiologists reveals that the summaries generated by the RL model are highly rated in terms of factual correctness and overall quality, occasionally surpassing human-authored references.

Implications and Future Directions

The proposed approach presents a substantial advancement in the domain of medical summarization, addressing a critical need for factual accuracy in clinical narrative generation. This development holds potential for integration into real-world radiology workflows, potentially improving efficiency and communication within clinical environments.

The paper also highlights the broader applicability of the framework, suggesting that similar methodologies could be adapted to other domains where factual correctness is paramount. Future research could focus on refining IE systems to enhance generalizability across diverse text domains, as well as exploring automated means to handle factual ambiguities and non-binary factual representations.

In conclusion, this paper contributes a significant methodological innovation to the field of natural language processing, specifically in domains where factual precision is non-negotiable. The integration of abstract summarization models with fact-checking mechanisms and reinforcement learning opens new avenues for research and application in AI-driven text generation.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Yuhao Zhang (107 papers)
Derek Merck (1 paper)
Emily Bao Tsai (2 papers)
Christopher D. Manning (169 papers)
Curtis P. Langlotz (23 papers)

Citations (178)

View on Semantic Scholar

Optimizing the Factual Correctness of a Summary: A Study of Summarizing Radiology Reports (1911.02541v3)

Optimizing Factual Correctness in Radiology Report Summarization

Methodological Approach

Empirical Evaluation

Implications and Future Directions

Related Papers