Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

How much data is needed to train a medical image deep learning system to achieve necessary high accuracy? (1511.06348v2)

Published 19 Nov 2015 in cs.LG, cs.CV, and cs.NE

Abstract: The use of Convolutional Neural Networks (CNN) in natural image classification systems has produced very impressive results. Combined with the inherent nature of medical images that make them ideal for deep-learning, further application of such systems to medical image classification holds much promise. However, the usefulness and potential impact of such a system can be completely negated if it does not reach a target accuracy. In this paper, we present a study on determining the optimum size of the training data set necessary to achieve high classification accuracy with low variance in medical image classification systems. The CNN was applied to classify axial Computed Tomography (CT) images into six anatomical classes. We trained the CNN using six different sizes of training data set (5, 10, 20, 50, 100, and 200) and then tested the resulting system with a total of 6000 CT images. All images were acquired from the Massachusetts General Hospital (MGH) Picture Archiving and Communication System (PACS). Using this data, we employ the learning curve approach to predict classification accuracy at a given training sample size. Our research will present a general methodology for determining the training data set size necessary to achieve a certain target classification accuracy that can be easily applied to other problems within such systems.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Junghwan Cho (3 papers)
  2. Kyewook Lee (1 paper)
  3. Ellie Shin (1 paper)
  4. Garry Choy (1 paper)
  5. Synho Do (4 papers)
Citations (322)

Summary

  • The paper establishes a replicable framework linking training dataset size to CNN classification accuracy, demonstrating improvements from 8% to 95%.
  • It employs a learning curve approach with the GoogLeNet architecture to analyze six anatomical classes from CT images, optimizing the training process.
  • The findings set practical benchmarks, suggesting approximately 4,092 images per class are needed to reach near-perfect accuracy for CADe/CADx systems.

Overview of Data Requirements in Medical Image Deep Learning

The paper "How much data is needed to train a medical image deep learning system to achieve necessary high accuracy?" by Junghwan Cho et al. addresses a critical aspect of applying deep learning methodologies to medical image classification: determining the optimal dataset size for achieving high classification accuracy. By systematically analyzing the impact of varying training data sizes on classifier performance, the authors provide insights that are crucial for the deployment of accurate and reliable computer-aided detection (CADe) and diagnosis (CADx) systems in clinical settings.

The paper employs Convolutional Neural Networks (CNN) to classify axial Computed Tomography (CT) images into six anatomical classes, leveraging large datasets from the Massachusetts General Hospital. By training CNNs with different dataset sizes—ranging from 5 to 200 images—the authors applied the learning curve approach to project classification accuracy as a function of training sample size. This methodological approach aims to establish a scalable framework that can be generalized to other domains within medical image analysis.

Methodological Core

The research uses the GoogLeNet architecture, chosen for its computational efficiency and robust classification potential, to tackle the problem of anatomical image classification. The CNN framework, trained using stochastic gradient descent on NVIDIA's Deep Learning GPU Training System, demonstrated varying levels of accuracy dependent on training dataset size. The authors performed multiple experiments to assess classification accuracy across different body parts when analyzing 6000 CT images in total.

Given the distinct characteristics of medical images, including their standardized DICOM format, high quality, and associated radiologist annotations, this paper effectively utilizes these features to maximize the learning potential of deep learning systems. Despite these advantages, the access constraints related to patient privacy laws, such as HIPAA, pose challenges in obtaining the requisite volume of data for training complex models.

Key Results

The experimental findings indicate that increasing the size of the training dataset consistently improves the classification accuracy across all anatomical classes. For instance, the accuracy increased from 8.01% with a training size of 5 to 95.67% with a training size of 200, underscoring the importance of extensive datasets in achieving high accuracy and low variance. Through the application of learning curves, the authors extrapolated a predicted training size of 4092 images per class to achieve an accuracy threshold of 99.5%.

The results suggest that improvements in classification accuracy begin to plateau beyond a certain dataset size, highlighting potential diminishing returns on data accumulation efforts. This observation provides valuable benchmarks for future system development and resource allocation, facilitating a more informed approach to dataset curation in medical image deep learning initiatives.

Implications and Future Directions

This research contributes significantly to the discourse on data requirements in deep learning applications for medical imaging, offering a replicable methodology for estimating necessary training dataset sizes across varying contexts. As the paper has shown potential for adaptation and application to broader medical imaging modalities, it sets a foundation for developing more comprehensive, disease-specific classification systems. This could extend to applications involving abnormality detection and organ-specific classifications, thereby enhancing diagnostic accuracy and clinical utility.

In future work, fine-tuning existing architectures and exploring alternative models such as transfer learning could further optimize system performance for specific medical imaging challenges. Additionally, addressing the ethical and legal challenges related to data access will play a crucial role in realizing the full potential of AI-driven medical imaging solutions. Overall, this research elucidates the path towards more sophisticated and reliable CADe and CADx systems capable of significantly improving healthcare outcomes through the responsible application of deep learning technologies.