COVIDGR dataset and COVID-SDNet methodology for predicting COVID-19 based on Chest X-Ray images (2006.01409v3)

Published 2 Jun 2020 in eess.IV and cs.CV

Abstract: Currently, Coronavirus disease (COVID-19), one of the most infectious diseases in the 21st century, is diagnosed using RT-PCR testing, CT scans and/or Chest X-Ray (CXR) images. CT (Computed Tomography) scanners and RT-PCR testing are not available in most medical centers and hence in many cases CXR images become the most time/cost effective tool for assisting clinicians in making decisions. Deep learning neural networks have a great potential for building COVID-19 triage systems and detecting COVID-19 patients, especially patients with low severity. Unfortunately, current databases do not allow building such systems as they are highly heterogeneous and biased towards severe cases. This paper is three-fold: (i) we demystify the high sensitivities achieved by most recent COVID-19 classification models, (ii) under a close collaboration with Hospital Universitario Cl\'inico San Cecilio, Granada, Spain, we built COVIDGR-1.0, a homogeneous and balanced database that includes all levels of severity, from normal with Positive RT-PCR, Mild, Moderate to Severe. COVIDGR-1.0 contains 426 positive and 426 negative PA (PosteroAnterior) CXR views and (iii) we propose COVID Smart Data based Network (COVID-SDNet) methodology for improving the generalization capacity of COVID-classification models. Our approach reaches good and stable results with an accuracy of $97.72\% \pm 0.95 \%$, $86.90\% \pm 3.20\%$, $61.80\% \pm 5.49\%$ in severe, moderate and mild COVID-19 severity levels (Paper accepted for publication in Journal of Biomedical and Health Informatics). Our approach could help in the early detection of COVID-19. COVIDGR-1.0 along with the severity level labels are available to the scientific community through this link https://dasci.es/es/transferencia/open-data/covidgr/.

Citations (260)

View on Semantic Scholar

Summary

The paper presents the COVIDGR-1.0 dataset, offering a balanced collection of positive and negative chest X-ray images across various COVID-19 severity levels.
The paper introduces the COVID-SDNet methodology that integrates segmentation, data transformation, and CNN classification to achieve high diagnostic accuracy.
The study demonstrates that training with balanced, high-quality datasets improves the robustness and generalization of COVID-19 diagnostic models.

Overview of the COVIDGR Dataset and COVID-SDNet for Predicting COVID-19 from Chest X-Ray Images

This essay presents a succinct analysis of the paper focused on the COVIDGR-1.0 dataset and the COVID-SDNet methodology developed to predict COVID-19 severity from Chest X-Ray (CXR) images. The paper addresses the limitations of current diagnostic methods and datasets, providing a new approach that significantly enhances the ability of machine learning models to analyze CXR images for COVID-19 identification.

Dataset Development and Challenges

The COVIDGR-1.0 dataset was developed in collaboration with a hospital in Spain, comprising 426 positive and 426 negative PosteroAnterior (PA) CXR views. A notable feature of this dataset is its balance and inclusion of different COVID-19 severity levels—Normal with Positive RT-PCR, Mild, Moderate, and Severe—which overcomes the limitations of existing datasets that are often biased towards severe cases. Such balanced datasets are essential for training models that can generalize well across the severity spectrum, particularly for detecting less severe cases that are clinically significant for triage purposes.

COVID-SDNet Methodology

The COVID-SDNet methodology leverages deep learning to enhance the generalization capacity of COVID-19 classification models. It integrates several advanced techniques: segmentation to focus on relevant parts of CXR images, a Class-inherent Transformations Network (FuCiTNet) inspired by GANs for data transformation, and a Convolutional Neural Network (CNN) for classification. Experiments demonstrated that COVID-SDNet achieved substantial accuracy rates of 97.72% for severe, 86.90% for moderate, and 61.80% for mild COVID-19 severity levels, highlighting its potential effectiveness in early detection, especially for moderate and severe cases.

Comparative Analysis and Experimental Results

The paper includes a comparative evaluation with state-of-the-art models such as COVIDNet and COVID-CAPS, both of which showed weaker performance on the novel COVIDGR-1.0 dataset when trained on other datasets. The COVID-SDNet consistently outperformed these models, demonstrating superior sensitivity, specificity, and overall stability across various degrees of COVID-19 severity. This highlights a key insight: models trained on high-quality, balanced datasets like COVIDGR-1.0 are better poised to provide clinically applicable outcomes.

Implications and Future Directions

The implications of this work are significant for both the practical application of AI in medical imaging and theoretical advancements in dataset curation and model architecture for pandemic-related challenges. The prospect of integrating the COVID-SDNet methodology with other diagnostic approaches (e.g., combining clinical features with imaging data) could further enhance diagnostic accuracy. Future developments could explore further optimizations, including the inclusion of a broader spectrum of CXR images sourced from diverse geographical and demographic backgrounds to ensure robust generalization capabilities.

In conclusion, the development of the COVIDGR-1.0 dataset and the COVID-SDNet methodology sets a new benchmark for CXR-based COVID-19 diagnostic tools. This work underscores the importance of collaboration between clinical practitioners and machine learning researchers to address real-world medical challenges effectively.

PDF Markdown