nnMobileNet: Rethinking CNN for Retinopathy Research (2306.01289v4)

Published 2 Jun 2023 in eess.IV and cs.CV

Abstract: Over the past few decades, convolutional neural networks (CNNs) have been at the forefront of the detection and tracking of various retinal diseases (RD). Despite their success, the emergence of vision transformers (ViT) in the 2020s has shifted the trajectory of RD model development. The leading-edge performance of ViT-based models in RD can be largely credited to their scalability-their ability to improve as more parameters are added. As a result, ViT-based models tend to outshine traditional CNNs in RD applications, albeit at the cost of increased data and computational demands. ViTs also differ from CNNs in their approach to processing images, working with patches rather than local regions, which can complicate the precise localization of small, variably presented lesions in RD. In our study, we revisited and updated the architecture of a CNN model, specifically MobileNet, to enhance its utility in RD diagnostics. We found that an optimized MobileNet, through selective modifications, can surpass ViT-based models in various RD benchmarks, including diabetic retinopathy grading, detection of multiple fundus diseases, and classification of diabetic macular edema. The code is available at https://github.com/Retinal-Research/NN-MOBILENET

References (43)

Citations (4)

View on Semantic Scholar

Summary

The paper introduces nnMobileNet, a refined MobileNet CNN architecture that achieves competitive retinopathy diagnosis performance, challenging the dominance of Vision Transformers especially with limited data.
Key methodological enhancements, including refined channel configuration, advanced data augmentation, and spatial Dropout, enabled nnMobileNet to perform effectively across multiple datasets without extensive pre-training.
The results highlight that carefully modified CNNs can remain potent tools for retinal disease research, offering efficient and effective diagnostic capabilities suitable for diverse environments and data conditions.

Evaluation of nnMobileNet: Enhancing CNN Architectures for Retinopathy Analysis

The paper "nnMobileNet: Rethinking CNN for Retinopathy Research" presents an in-depth exploration of convolutional neural networks (CNNs) applied to retinal disease (RD) diagnosis. Despite the dominance of Vision Transformers (ViTs) in recent RD applications due to their scalability and strong performance across large datasets, this paper reconsiders the efficacy of CNNs, particularly the MobileNet architecture, in this domain.

Objectives and Motivation

CNNs have served as robust frameworks for image processing tasks, including those related to retinal images, where their capacity to encapsulate spatial hierarchies and local patterns has previously led to notable successes. However, the advent of ViTs, which process images in patches enabling more global context understanding, often yields superior results. Yet, ViTs are entrenched in computational intensiveness and scalability constraints, thus demanding extensive dataset pretraining.

This paper reassesses and innovates upon the CNN architecture—specifically MobileNet, a lightweight model known for its efficiency—to address RD detection efficacy without needing extensive pretraining data preparation. The central hypothesis challenges preconceived notions that ViTs supersede CNNs entirely in RD tasks, especially where computational and data resources are limited.

Methodological Enhancements

Several strategic adjustments were made to the standard MobileNet model to tailor it for RD analysis:

Channel Configuration: By adjusting stage-wise channel configurations, researchers improved the expressiveness of model layers. This change allows for more precise feature representation, particularly significant in RD diagnosis tasks where lesion detection requires intricate detail capturing.
Data Augmentation: As RD lesion characteristics can vary greatly and are often localized to small image sections, enhanced data augmentation techniques were applied. These included heavy augmentations that, in contrast to prior understanding, improved CNN training outcomes by simulating a diverse range of retinal pathologies.
Dropout Utilization: While standard Dropout helps mitigate overfitting through random neuron deactivation, the introduction of spatial Dropout enhanced local feature pattern recognition. This facilitates better retention of lesion-specific features crucial for discerning RD severity.
Optimization Process: By employing the AdamP optimizer, which adapts learning rates more effectively across network layers, nnMobileNet achieved more stable convergence, contributing to its improved diagnostic accuracy over traditional methods.
Activation Functions: The paper examined various activation functions, ultimately selecting ReLU6 for its ability to better handle the dynamic range of features output by the refined MobileNet layers.

Experimental Evaluation and Results

Empirical evaluation across multiple RD datasets (i.e., Messidor, RFMiD, APOTS, IDRiD, and the MMAC challenge dataset) demonstrated the success of nnMobileNet. The model consistently achieved competitive, if not superior, diagnostic accuracy and area under the curve (AUC) metrics compared to state-of-the-art ViT variants and previously established CNN architectures.

Importantly, nnMobileNet reached peak performance without extensive external dataset pre-training, thus proving its practicality and adaptability for resource-constrained environments. Its lightweight nature also translated into more rapid inference times, crucial for clinical settings requiring real-time analysis.

Implications and Future Directions

The findings urge a reconsideration of pure ViT dominance in RD applications and highlight opportunities to recalibrate simple yet powerful CNN models to meet or exceed current benchmarks in retinal imaging tasks. nnMobileNet’s performance suggests that ViTs may not always be superior, especially when smaller, high-quality datasets are the primary data sources available.

Future research could explore hybrid models combining CNN locality strengths with ViT’s global context capabilities, potentially achieving a balance that maximizes lesion detection and classification accuracies across various RD scenarios. Moreover, integrating large-kernel convolutions could further enhance the capabilities of CNNs to capture more extensive image relationships typically emphasized by ViTs.

In conclusion, this paper reinforces the notion that with thoughtful modifications, CNNs like MobileNet can remain pivotal in RD research, providing efficient and effective diagnostic tools well-suited for diverse environments and data conditions.

PDF Markdown

Related Papers

GitHub

GitHub - Retinal-Research/NN-MOBILENET: Code for the paper "nnMobileNet: Rethinking CNN for Retinopathy Research" (111 stars)