Multi-scale Spatio-temporal Transformer-based Imbalanced Longitudinal Learning for Glaucoma Forecasting from Irregular Time Series Images (2402.13475v1)
Abstract: Glaucoma is one of the major eye diseases that leads to progressive optic nerve fiber damage and irreversible blindness, afflicting millions of individuals. Glaucoma forecast is a good solution to early screening and intervention of potential patients, which is helpful to prevent further deterioration of the disease. It leverages a series of historical fundus images of an eye and forecasts the likelihood of glaucoma occurrence in the future. However, the irregular sampling nature and the imbalanced class distribution are two challenges in the development of disease forecasting approaches. To this end, we introduce the Multi-scale Spatio-temporal Transformer Network (MST-former) based on the transformer architecture tailored for sequential image inputs, which can effectively learn representative semantic information from sequential images on both temporal and spatial dimensions. Specifically, we employ a multi-scale structure to extract features at various resolutions, which can largely exploit rich spatial information encoded in each image. Besides, we design a time distance matrix to scale time attention in a non-linear manner, which could effectively deal with the irregularly sampled data. Furthermore, we introduce a temperature-controlled Balanced Softmax Cross-entropy loss to address the class imbalance issue. Extensive experiments on the Sequential fundus Images for Glaucoma Forecast (SIGF) dataset demonstrate the superiority of the proposed MST-former method, achieving an AUC of 98.6% for glaucoma forecasting. Besides, our method shows excellent generalization capability on the Alzheimer's Disease Neuroimaging Initiative (ADNI) MRI dataset, with an accuracy of 90.3% for mild cognitive impairment and Alzheimer's disease prediction, outperforming the compared method by a large margin.
- “Deep convolution neural network for accurate diagnosis of glaucoma using digital fundus images,” Information Sciences, vol. 441, pp. 41–49, 2018.
- “The number of people with glaucoma worldwide in 2010 and 2020,” British Journal of Ophthalmology, vol. 90, no. 3, pp. 262–267, 2006.
- “Deep relation transformer for diagnosing glaucoma with optical coherence tomography and visual field function,” IEEE Transactions on Medical Imaging, vol. 40, no. 9, pp. 2392–2402, 2021.
- “Model-based optic nerve head segmentation on retinal fundus images,” in 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE, 2011, pp. 2626–2629.
- “Superpixel classification based optic disc and optic cup segmentation for glaucoma screening,” IEEE transactions on medical imaging, vol. 32, no. 6, pp. 1019–1032, 2013.
- “Quadratic divergence regularized svm for optic disc segmentation,” Biomedical optics express, vol. 8, no. 5, pp. 2687–2696, 2017.
- “Accurate prediction of glaucoma from colour fundus images with a convolutional neural network that relies on active and transfer learning,” Acta Ophthalmologica, vol. 98, no. 1, pp. e94–e100, 2020.
- “A novel multimodality based dual fusion integrated approach for efficient and early prediction of glaucoma,” Biomedical Signal Processing and Control, vol. 73, pp. 103468, 2022.
- “Predicting glaucoma before onset using deep learning,” Ophthalmology Glaucoma, vol. 3, no. 4, pp. 262–268, 2020.
- “Deepgf: Glaucoma forecast using the sequential fundus images,” in Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part V 23. Springer, 2020, pp. 626–635.
- “Glim-net: Chronic glaucoma forecast transformer for irregularly sampled sequential fundus images,” IEEE Transactions on Medical Imaging, pp. 1–1, 2023.
- “A clinically applicable approach to continuous prediction of future acute kidney injury,” Nature, vol. 572, no. 7767, pp. 116–119, 2019.
- “Longitudinal detection of radiological abnormalities with time-modulated lstm,” in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings 4. Springer, 2018, pp. 326–333.
- “Longitudinal prediction of lung nodule invasiveness by sequential modelling with common clinical computed tomography (ct) measurements: a prediction accuracy study,” Translational Lung Cancer Research, vol. 11, no. 5, pp. 845, 2022.
- “Time-distanced gates in long short-term memory networks,” Medical image analysis, vol. 65, pp. 101785, 2020.
- “Multi-scale multi-structure siamese network (mmsnet) for primary open-angle glaucoma prediction,” in International Workshop on Machine Learning in Medical Imaging. Springer, 2022, pp. 436–445.
- “Transformesh: A transformer network for longitudinal modeling of anatomical meshes,” in Machine Learning in Medical Imaging: 12th International Workshop, MLMI 2021, Held in Conjunction with MICCAI 2021, Strasbourg, France, September 27, 2021, Proceedings 12. Springer, 2021, pp. 209–218.
- “Time-distance vision transformers in lung cancer diagnosis from longitudinal computed tomography,” in Medical Imaging 2023: Image Processing. SPIE, 2023, vol. 12464, pp. 221–230.
- “Longitudinal multimodal transformer integrating imaging and latent clinical signatures from routine ehrs for pulmonary nodule classification,” arXiv preprint arXiv:2304.02836, 2023.
- “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
- “Vivit: A video vision transformer,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 6836–6846.
- “Is space-time attention all you need for video understanding?,” in ICML, 2021, vol. 2, p. 4.
- “Tokenlearner: Adaptive space-time tokenization for videos,” Advances in Neural Information Processing Systems, vol. 34, pp. 12786–12797, 2021.
- “Masked autoencoders as spatiotemporal learners,” Advances in neural information processing systems, vol. 35, pp. 35946–35958, 2022.
- “Masked autoencoders are scalable vision learners,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16000–16009.
- “Uniformer: Unifying convolution and self-attention for visual recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 10, pp. 12581–12600, 2023.
- “Balanced meta-softmax for long-tailed visual recognition,” Advances in neural information processing systems, vol. 33, pp. 4175–4186, 2020.
- “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
- “Lilt: A simple yet effective language-independent layout transformer for structured document understanding,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022, pp. 7747–7757.
- “Multiscale vision transformers,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 6824–6835.
- “Focal loss for dense object detection,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2980–2988.
- “Class-balanced loss based on effective number of samples,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 9268–9277.
- “Learning imbalanced datasets with label-distribution-aware margin loss,” Advances in neural information processing systems, vol. 32, 2019.
- “Equalization loss for long-tailed object recognition,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11662–11671.
- “Decoupling representation and classifier for long-tailed recognition,” arXiv preprint arXiv:1910.09217, 2019.
- “Within-subject template estimation for unbiased longitudinal image analysis,” Neuroimage, vol. 61, no. 4, pp. 1402–1418, 2012.
- “Attention based glaucoma detection: a large-scale database and cnn model,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 10571–10580.
- “Glaucoma detection based on deep convolutional neural network,” in 2015 37th annual international conference of the IEEE engineering in medicine and biology society (EMBC). IEEE, 2015, pp. 715–718.