Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

nnMobileNet: Rethinking CNN for Retinopathy Research (2306.01289v4)

Published 2 Jun 2023 in eess.IV and cs.CV

Abstract: Over the past few decades, convolutional neural networks (CNNs) have been at the forefront of the detection and tracking of various retinal diseases (RD). Despite their success, the emergence of vision transformers (ViT) in the 2020s has shifted the trajectory of RD model development. The leading-edge performance of ViT-based models in RD can be largely credited to their scalability-their ability to improve as more parameters are added. As a result, ViT-based models tend to outshine traditional CNNs in RD applications, albeit at the cost of increased data and computational demands. ViTs also differ from CNNs in their approach to processing images, working with patches rather than local regions, which can complicate the precise localization of small, variably presented lesions in RD. In our study, we revisited and updated the architecture of a CNN model, specifically MobileNet, to enhance its utility in RD diagnostics. We found that an optimized MobileNet, through selective modifications, can surpass ViT-based models in various RD benchmarks, including diabetic retinopathy grading, detection of multiple fundus diseases, and classification of diabetic macular edema. The code is available at https://github.com/Retinal-Research/NN-MOBILENET

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Aptos database.
  2. https://codalab.lisn.upsaclay.fr/competitions/12441.
  3. Cotton-wool spots. Retina, 5(4):206–214, 1985.
  4. Learning robust representation for joint grading of ophthalmic diseases via adaptive curriculum and feature disentanglement. In MICCAI, pages 523–533, 2022.
  5. A multi-task deep learning model for the classification of Age-related Macular Degeneration. AMIA Jt Summits Transl Sci Proc, 2019.
  6. Feedback on a publicly distributed image database: The messidor database. Image Analysis & Stereology, 2014.
  7. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  8. Why is the winner the best? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19955–19966, 2023.
  9. Rethinking channel dimensions for efficient model design. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, pages 732–741, 2021.
  10. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  11. Adamp: Slowing down the slowdown for momentum optimizers on scale-invariant weights. arXiv preprint arXiv:2006.08217, 2020.
  12. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580, 2012.
  13. Towards label-efficient deep learning for myopic maculopathy classification. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 31–45. Springer, 2023.
  14. Satformer: Saliency-guided abnormality-aware transformer for retinal disease classification in fundus image. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI, pages 987–994, 2022.
  15. CANet: Cross-Disease Attention Network for Joint Diabetic Retinopathy and Diabetic Macular Edema Grading. IEEE Trans Med Imaging, pages 1483–1493, 2020.
  16. Localvit: Bringing locality to vision transformers. arXiv preprint arXiv:2104.05707, 2021.
  17. A framework for identifying diabetic retinopathy based on anti-noise detection and attention-based fusion. In MICCAI, pages 74–82, Cham, 2018. Springer.
  18. Multi-Task Deep Model With Margin Ranking Loss for Lung Nodule Analysis. IEEE Trans Med Imaging, 39(3):718–728, 2020.
  19. Green: a graph residual re-ranking network for grading diabetic retinopathy. In MICCAI, pages 585–594, Cham, 2020. Springer International Publishing.
  20. Swin transformer: Hierarchical vision transformer using shifted windows. In Proc. IEEE Int. Conf. Comput. Vis.(ICCV), pages 10012–10022, 2021.
  21. A convnet for the 2020s. In Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pat- tern Recognit, pages 11976–11986, 2022.
  22. Decoupled weight decay regularization. In International Conference on Learning Representations.
  23. Swin-mmc: Swin-based model for myopic maculopathy classification in fundus images. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 18–30. Springer, 2023.
  24. Retinal fundus multi-disease image dataset (rfmid). 2020.
  25. Indian diabetic retinopathy image dataset (idrid). 2018.
  26. Alexander Rakhlin. Diabetic retinopathy detection through integration of deep learning classification framework. BioRxiv, page 225508, 2017.
  27. Evaluation of a computer-aided diagnosis system for diabetic retinopathy screening on public data. Investigative ophthalmology & visual science, 52(7):4866–4871, 2011.
  28. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pat- tern Recognit, pages 4510–4520, 2018.
  29. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017.
  30. Lesion-aware transformers for diabetic retinopathy grading. In Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pat- tern Recognit, pages 10938–10947, 2021.
  31. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning. PMLR, 2019.
  32. Efficient object localization using convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 648–656, 2015.
  33. Exploring the limits of data augmentation for retinal vessel segmentation. arXiv preprint arXiv:2105.09365, 2021.
  34. New deep neural nets for fine-grained diabetic retinopathy recognition on hybrid color space. In 2016 IEEE International Symposium on Multimedia (ISM), pages 209–215, 2016.
  35. Crossformer: A versatile vision transformer hinging on cross-scale attention. arXiv preprint arXiv:2108.00154, 2021.
  36. Zoom-in-net: Deep mining lesions for diabetic retinopathy detection. In MICCAI, pages 267–275, 2017.
  37. David Yorston. Retinal diseases and vision 2020. Community Eye Health, 16(46):19–20, 2003.
  38. Mil-vt: Multiple instance learning enhanced vision transformer for fundus image classification. In MICCAI, pages 45–54. Springer, 2021.
  39. Collaborative learning of semi-supervised segmentation and classification for medical images. In Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pat- tern Recognit, 2019.
  40. Self-supervised equivariant regularization reconciles multiple instance learning: Joint referable diabetic retinopathy classification and lesion segmentation. 18th International Symposium on Medical Information Processing and Analysis (SIPAIM), 2022.
  41. Beyond mobilenet: An improved mobilenet for retinal diseases. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 56–65. Springer, 2023a.
  42. Otre: Where optimal transport guided unpaired image-to-image translation meets regularization by enhancing. In International Conference on Information Processing in Medical Imaging, pages 415–427. Springer, 2023b.
  43. Optimal transport guided unsupervised learning for enhancing low-quality retinal images. arXiv preprint arXiv:2302.02991, 2023c.
Citations (4)

Summary

  • The paper introduces nnMobileNet, a refined MobileNet CNN architecture that achieves competitive retinopathy diagnosis performance, challenging the dominance of Vision Transformers especially with limited data.
  • Key methodological enhancements, including refined channel configuration, advanced data augmentation, and spatial Dropout, enabled nnMobileNet to perform effectively across multiple datasets without extensive pre-training.
  • The results highlight that carefully modified CNNs can remain potent tools for retinal disease research, offering efficient and effective diagnostic capabilities suitable for diverse environments and data conditions.

Evaluation of nnMobileNet: Enhancing CNN Architectures for Retinopathy Analysis

The paper "nnMobileNet: Rethinking CNN for Retinopathy Research" presents an in-depth exploration of convolutional neural networks (CNNs) applied to retinal disease (RD) diagnosis. Despite the dominance of Vision Transformers (ViTs) in recent RD applications due to their scalability and strong performance across large datasets, this paper reconsiders the efficacy of CNNs, particularly the MobileNet architecture, in this domain.

Objectives and Motivation

CNNs have served as robust frameworks for image processing tasks, including those related to retinal images, where their capacity to encapsulate spatial hierarchies and local patterns has previously led to notable successes. However, the advent of ViTs, which process images in patches enabling more global context understanding, often yields superior results. Yet, ViTs are entrenched in computational intensiveness and scalability constraints, thus demanding extensive dataset pretraining.

This paper reassesses and innovates upon the CNN architecture—specifically MobileNet, a lightweight model known for its efficiency—to address RD detection efficacy without needing extensive pretraining data preparation. The central hypothesis challenges preconceived notions that ViTs supersede CNNs entirely in RD tasks, especially where computational and data resources are limited.

Methodological Enhancements

Several strategic adjustments were made to the standard MobileNet model to tailor it for RD analysis:

  • Channel Configuration: By adjusting stage-wise channel configurations, researchers improved the expressiveness of model layers. This change allows for more precise feature representation, particularly significant in RD diagnosis tasks where lesion detection requires intricate detail capturing.
  • Data Augmentation: As RD lesion characteristics can vary greatly and are often localized to small image sections, enhanced data augmentation techniques were applied. These included heavy augmentations that, in contrast to prior understanding, improved CNN training outcomes by simulating a diverse range of retinal pathologies.
  • Dropout Utilization: While standard Dropout helps mitigate overfitting through random neuron deactivation, the introduction of spatial Dropout enhanced local feature pattern recognition. This facilitates better retention of lesion-specific features crucial for discerning RD severity.
  • Optimization Process: By employing the AdamP optimizer, which adapts learning rates more effectively across network layers, nnMobileNet achieved more stable convergence, contributing to its improved diagnostic accuracy over traditional methods.
  • Activation Functions: The paper examined various activation functions, ultimately selecting ReLU6 for its ability to better handle the dynamic range of features output by the refined MobileNet layers.

Experimental Evaluation and Results

Empirical evaluation across multiple RD datasets (i.e., Messidor, RFMiD, APOTS, IDRiD, and the MMAC challenge dataset) demonstrated the success of nnMobileNet. The model consistently achieved competitive, if not superior, diagnostic accuracy and area under the curve (AUC) metrics compared to state-of-the-art ViT variants and previously established CNN architectures.

Importantly, nnMobileNet reached peak performance without extensive external dataset pre-training, thus proving its practicality and adaptability for resource-constrained environments. Its lightweight nature also translated into more rapid inference times, crucial for clinical settings requiring real-time analysis.

Implications and Future Directions

The findings urge a reconsideration of pure ViT dominance in RD applications and highlight opportunities to recalibrate simple yet powerful CNN models to meet or exceed current benchmarks in retinal imaging tasks. nnMobileNet’s performance suggests that ViTs may not always be superior, especially when smaller, high-quality datasets are the primary data sources available.

Future research could explore hybrid models combining CNN locality strengths with ViT’s global context capabilities, potentially achieving a balance that maximizes lesion detection and classification accuracies across various RD scenarios. Moreover, integrating large-kernel convolutions could further enhance the capabilities of CNNs to capture more extensive image relationships typically emphasized by ViTs.

In conclusion, this paper reinforces the notion that with thoughtful modifications, CNNs like MobileNet can remain pivotal in RD research, providing efficient and effective diagnostic tools well-suited for diverse environments and data conditions.