nnMobileNet: Rethinking CNN for Retinopathy Research (2306.01289v4)
Abstract: Over the past few decades, convolutional neural networks (CNNs) have been at the forefront of the detection and tracking of various retinal diseases (RD). Despite their success, the emergence of vision transformers (ViT) in the 2020s has shifted the trajectory of RD model development. The leading-edge performance of ViT-based models in RD can be largely credited to their scalability-their ability to improve as more parameters are added. As a result, ViT-based models tend to outshine traditional CNNs in RD applications, albeit at the cost of increased data and computational demands. ViTs also differ from CNNs in their approach to processing images, working with patches rather than local regions, which can complicate the precise localization of small, variably presented lesions in RD. In our study, we revisited and updated the architecture of a CNN model, specifically MobileNet, to enhance its utility in RD diagnostics. We found that an optimized MobileNet, through selective modifications, can surpass ViT-based models in various RD benchmarks, including diabetic retinopathy grading, detection of multiple fundus diseases, and classification of diabetic macular edema. The code is available at https://github.com/Retinal-Research/NN-MOBILENET
- Aptos database.
- https://codalab.lisn.upsaclay.fr/competitions/12441.
- Cotton-wool spots. Retina, 5(4):206–214, 1985.
- Learning robust representation for joint grading of ophthalmic diseases via adaptive curriculum and feature disentanglement. In MICCAI, pages 523–533, 2022.
- A multi-task deep learning model for the classification of Age-related Macular Degeneration. AMIA Jt Summits Transl Sci Proc, 2019.
- Feedback on a publicly distributed image database: The messidor database. Image Analysis & Stereology, 2014.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Why is the winner the best? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19955–19966, 2023.
- Rethinking channel dimensions for efficient model design. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, pages 732–741, 2021.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Adamp: Slowing down the slowdown for momentum optimizers on scale-invariant weights. arXiv preprint arXiv:2006.08217, 2020.
- Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580, 2012.
- Towards label-efficient deep learning for myopic maculopathy classification. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 31–45. Springer, 2023.
- Satformer: Saliency-guided abnormality-aware transformer for retinal disease classification in fundus image. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI, pages 987–994, 2022.
- CANet: Cross-Disease Attention Network for Joint Diabetic Retinopathy and Diabetic Macular Edema Grading. IEEE Trans Med Imaging, pages 1483–1493, 2020.
- Localvit: Bringing locality to vision transformers. arXiv preprint arXiv:2104.05707, 2021.
- A framework for identifying diabetic retinopathy based on anti-noise detection and attention-based fusion. In MICCAI, pages 74–82, Cham, 2018. Springer.
- Multi-Task Deep Model With Margin Ranking Loss for Lung Nodule Analysis. IEEE Trans Med Imaging, 39(3):718–728, 2020.
- Green: a graph residual re-ranking network for grading diabetic retinopathy. In MICCAI, pages 585–594, Cham, 2020. Springer International Publishing.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proc. IEEE Int. Conf. Comput. Vis.(ICCV), pages 10012–10022, 2021.
- A convnet for the 2020s. In Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pat- tern Recognit, pages 11976–11986, 2022.
- Decoupled weight decay regularization. In International Conference on Learning Representations.
- Swin-mmc: Swin-based model for myopic maculopathy classification in fundus images. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 18–30. Springer, 2023.
- Retinal fundus multi-disease image dataset (rfmid). 2020.
- Indian diabetic retinopathy image dataset (idrid). 2018.
- Alexander Rakhlin. Diabetic retinopathy detection through integration of deep learning classification framework. BioRxiv, page 225508, 2017.
- Evaluation of a computer-aided diagnosis system for diabetic retinopathy screening on public data. Investigative ophthalmology & visual science, 52(7):4866–4871, 2011.
- Mobilenetv2: Inverted residuals and linear bottlenecks. In Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pat- tern Recognit, pages 4510–4520, 2018.
- Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017.
- Lesion-aware transformers for diabetic retinopathy grading. In Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pat- tern Recognit, pages 10938–10947, 2021.
- Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning. PMLR, 2019.
- Efficient object localization using convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 648–656, 2015.
- Exploring the limits of data augmentation for retinal vessel segmentation. arXiv preprint arXiv:2105.09365, 2021.
- New deep neural nets for fine-grained diabetic retinopathy recognition on hybrid color space. In 2016 IEEE International Symposium on Multimedia (ISM), pages 209–215, 2016.
- Crossformer: A versatile vision transformer hinging on cross-scale attention. arXiv preprint arXiv:2108.00154, 2021.
- Zoom-in-net: Deep mining lesions for diabetic retinopathy detection. In MICCAI, pages 267–275, 2017.
- David Yorston. Retinal diseases and vision 2020. Community Eye Health, 16(46):19–20, 2003.
- Mil-vt: Multiple instance learning enhanced vision transformer for fundus image classification. In MICCAI, pages 45–54. Springer, 2021.
- Collaborative learning of semi-supervised segmentation and classification for medical images. In Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pat- tern Recognit, 2019.
- Self-supervised equivariant regularization reconciles multiple instance learning: Joint referable diabetic retinopathy classification and lesion segmentation. 18th International Symposium on Medical Information Processing and Analysis (SIPAIM), 2022.
- Beyond mobilenet: An improved mobilenet for retinal diseases. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 56–65. Springer, 2023a.
- Otre: Where optimal transport guided unpaired image-to-image translation meets regularization by enhancing. In International Conference on Information Processing in Medical Imaging, pages 415–427. Springer, 2023b.
- Optimal transport guided unsupervised learning for enhancing low-quality retinal images. arXiv preprint arXiv:2302.02991, 2023c.