REFUGE Challenge: A Unified Framework for Evaluating Automated Methods for Glaucoma Assessment from Fundus Photographs (1910.03667v1)

Published 8 Oct 2019 in cs.CV

Abstract: Glaucoma is one of the leading causes of irreversible but preventable blindness in working age populations. Color fundus photography (CFP) is the most cost-effective imaging modality to screen for retinal disorders. However, its application to glaucoma has been limited to the computation of a few related biomarkers such as the vertical cup-to-disc ratio. Deep learning approaches, although widely applied for medical image analysis, have not been extensively used for glaucoma assessment due to the limited size of the available data sets. Furthermore, the lack of a standardize benchmark strategy makes difficult to compare existing methods in a uniform way. In order to overcome these issues we set up the Retinal Fundus Glaucoma Challenge, REFUGE (\url{https://refuge.grand-challenge.org}), held in conjunction with MICCAI 2018. The challenge consisted of two primary tasks, namely optic disc/cup segmentation and glaucoma classification. As part of REFUGE, we have publicly released a data set of 1200 fundus images with ground truth segmentations and clinical glaucoma labels, currently the largest existing one. We have also built an evaluation framework to ease and ensure fairness in the comparison of different models, encouraging the development of novel techniques in the field. 12 teams qualified and participated in the online challenge. This paper summarizes their methods and analyzes their corresponding results. In particular, we observed that two of the top-ranked teams outperformed two human experts in the glaucoma classification task. Furthermore, the segmentation results were in general consistent with the ground truth annotations, with complementary outcomes that can be further exploited by ensembling the results.

Citations (544)

View on Semantic Scholar

Summary

The paper introduced the largest public dataset of 1200 fundus photographs with expert annotations for glaucoma assessment.
The paper standardized evaluation metrics like the Dice coefficient and AUC to benchmark segmentation and classification models.
The paper demonstrated that deep learning models, including U-Net variants and ensemble methods, can outperform human ophthalmologists in glaucoma detection.

Overview of the REFUGE Challenge Paper

The paper "REFUGE Challenge: A Unified Framework for Evaluating Automated Methods for Glaucoma Assessment from Fundus Photographs" presents a comprehensive evaluation of deep learning techniques for glaucoma assessment using color fundus photographs (CFP). The REFUGE challenge addresses the limitations in glaucoma diagnostic benchmarks by offering a standardized evaluation framework. It introduces a publicly available dataset, emphasizing two tasks: optic disc/cup segmentation and glaucoma classification.

Key Contributions

Dataset Composition: The paper details the creation of the largest publicly available dataset of 1200 CFPs, including reliable clinical labels for glaucoma and manual segmentations of optic disc (OD) and optic cup (OC), provided by multiple specialists. This addresses the scarcity of extensive public datasets, crucial for developing robust deep learning models.
Evaluation Framework: A uniform framework for evaluating glaucoma classification and OD/OC segmentation models is a primary focus. The challenge evaluates models using specific metrics: Dice coefficient for segmentation and area under the ROC curve (AUC) for classification, ensuring comparability of different approaches.
Challenge Results: The challenge witnessed participation from 12 teams, where top-performing models achieved superior results compared to human ophthalmologists in glaucoma classification, showcasing the potential of automated systems in clinical settings.

Methodological Insights

Segmentation: The majority of participating methods leverage U-Net variations and other CNN architectures like DeepLabv3+ for segmenting OD/OC, often employing two-stage approaches for improved localization and segmentation.
Classification: Various models, including adaptations of ResNet and Xception architectures, demonstrate effective feature learning from ONH-specific regions. Several teams integrated clinical measures like vertical cup-to-disc ratio (vCDR) to enhance classification accuracy.
Ensembles: The use of ensemble methods emerged as a successful strategy to enhance performance by combining the strengths of multiple models, which is critical for both segmentation and classification tasks.

Implications and Future Directions

The REFUGE challenge confirms the feasibility of using deep learning models for glaucoma detection and disc/cup segmentation from CFPs, with certain models outperforming human experts. The public release of the dataset and tools is expected to spearhead further research and innovation in automated glaucoma assessment.

The paper suggests exploration of cross-modality learning approaches, considering the potential improvement when combining CFPs with OCT data, to enhance diagnostic precision. Additionally, addressing challenges in generalization to diverse populations and real-world settings is identified as a crucial area.

In summary, the REFUGE challenge serves as a pivotal step toward advancing automated glaucoma screening and underscores the transformative potential of AI in ophthalmology. The framework and outcomes set a benchmark for future developments, advocating for enhanced data diversity and integration of multimodal data sources.

PDF Markdown