Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Development and Validation of a Deep Learning Algorithm for Improving Gleason Scoring of Prostate Cancer (1811.06497v1)

Published 15 Nov 2018 in cs.CV and cs.LG

Abstract: For prostate cancer patients, the Gleason score is one of the most important prognostic factors, potentially determining treatment independent of the stage. However, Gleason scoring is based on subjective microscopic examination of tumor morphology and suffers from poor reproducibility. Here we present a deep learning system (DLS) for Gleason scoring whole-slide images of prostatectomies. Our system was developed using 112 million pathologist-annotated image patches from 1,226 slides, and evaluated on an independent validation dataset of 331 slides, where the reference standard was established by genitourinary specialist pathologists. On the validation dataset, the mean accuracy among 29 general pathologists was 0.61. The DLS achieved a significantly higher diagnostic accuracy of 0.70 (p=0.002) and trended towards better patient risk stratification in correlations to clinical follow-up data. Our approach could improve the accuracy of Gleason scoring and subsequent therapy decisions, particularly where specialist expertise is unavailable. The DLS also goes beyond the current Gleason system to more finely characterize and quantitate tumor morphology, providing opportunities for refinement of the Gleason system itself.

Citations (386)

Summary

  • The paper introduces a deep learning algorithm that improves Gleason scoring accuracy using a novel two-stage CNN and kNN approach.
  • The study leverages an expansive dataset of 112 million image patches to stabilize scoring variability and enhance diagnostic reproducibility.
  • Results show the system outperforms most pathologists by achieving 70% accuracy and reducing grading errors by 4–6% for critical prognostic groups.

An Evaluation of Deep Learning for Enhancing Gleason Scoring in Prostate Cancer

The presented paper investigates the application of a Deep Learning System (DLS) for improving the accuracy of Gleason scoring in prostate cancer diagnosis. Traditionally reliant on the subjective microscopic examination by pathologists, the Gleason score is critical in guiding therapeutic decisions and predicting patient outcomes. However, its subjective nature poses reproducibility challenges, with prior studies reporting interobserver variability between 30-53%. This research addresses this limitation by employing a DLS on whole-slide images of prostatectomy specimens to both score Gleason grades and quantitatively characterize tumor morphology.

The methodology involved a two-stage approach: firstly, applying a convolutional neural network (CNN) to segment and classify image patches into regional Gleason patterns; and secondly, utilizing a k-nearest neighbor (kNN) algorithm to determine whole-slide Gleason scores. The DLS was trained on an expansive dataset of 112 million image patches from 1,226 slides, significantly larger than datasets used in previous comparable studies.

In assessing its performance on a validation set of 331 slides, the DLS showed an accuracy of 70% (p=0.002) compared to the 61% mean accuracy of 29 general pathologists. Notably, the DLS outperformed 8 out of 10 pathologists who reviewed the entire validation dataset individually. The DLS further demonstrated superior results in Gleason pattern quantitation, showing 4-6% lower mean absolute errors than pathologists, crucially in grade group demarcations where finer pattern distinctions could significantly influence prognosis.

The implications of these findings suggest that a DLS could potentially stabilize Gleason scoring variability, making it a useful tool in settings lacking genitourinary pathology specialists. Additionally, by offering a more detailed and quantitative analysis of tumor morphologies, it also has the potential to refine the Gleason grading system, moving towards a more continuum-based rather than discrete categorization.

Conclusively, while promising, the transition of such systems into clinical practice should be approached with consideration of the operational constraints observed, such as the current dependency on digital slide evaluations, which may differ from traditional glass-slide environments.

The paper reflects the broader trend in pathology and biomedicine where artificial intelligence is leveraged to supplement human expertise, enhancing diagnostic accuracy and reliability—paving the way for future advancements that could incorporate AI-driven insights into routine clinical decision-making processes. Future research should explore the application of such DLSs in biopsy specimens, other variants of prostate cancer, and those platforms integrated within routine clinical workflows, demonstrating their potential generalization and scalability across different pathological and clinical contexts.