Benchmarking Image Transformers for Prostate Cancer Detection from Ultrasound Data (2403.18233v1)
Abstract: PURPOSE: Deep learning methods for classifying prostate cancer (PCa) in ultrasound images typically employ convolutional networks (CNNs) to detect cancer in small regions of interest (ROI) along a needle trace region. However, this approach suffers from weak labelling, since the ground-truth histopathology labels do not describe the properties of individual ROIs. Recently, multi-scale approaches have sought to mitigate this issue by combining the context awareness of transformers with a CNN feature extractor to detect cancer from multiple ROIs using multiple-instance learning (MIL). In this work, we present a detailed study of several image transformer architectures for both ROI-scale and multi-scale classification, and a comparison of the performance of CNNs and transformers for ultrasound-based prostate cancer classification. We also design a novel multi-objective learning strategy that combines both ROI and core predictions to further mitigate label noise. METHODS: We evaluate 3 image transformers on ROI-scale cancer classification, then use the strongest model to tune a multi-scale classifier with MIL. We train our MIL models using our novel multi-objective learning strategy and compare our results to existing baselines. RESULTS: We find that for both ROI-scale and multi-scale PCa detection, image transformer backbones lag behind their CNN counterparts. This deficit in performance is even more noticeable for larger models. When using multi-objective learning, we can improve performance of MIL, with a 77.9% AUROC, a sensitivity of 75.9%, and a specificity of 66.3%. CONCLUSION: Convolutional networks are better suited for modelling sparse datasets of prostate ultrasounds, producing more robust features than transformers in PCa detection. Multi-scale methods remain the best architecture for this task, with multi-objective learning presenting an effective way to improve performance.
- Ahmed, H. U., Bosaily, A. E.-S., Brown, L. C., Gabe, R., Kaplan, R., Parmar, M. K., Collaco-Moraes, Y., Ward, K., Hindley, R. G., Freeman, A., et al., “Diagnostic accuracy of multi-parametric mri and trus biopsy in prostate cancer (promis): a paired validating confirmatory study,” The Lancet 389(10071), 815–822 (2017).
- Wilson, P. F., Gilany, M., Jamzad, A., Fooladgar, F., To, M. N. N., Wodlinger, B., Abolmaesumi, P., and Mousavi, P., “Self-supervised learning with limited labeled data for prostate cancer detection in high frequency ultrasound,” arXiv preprint arXiv:2211.00527 (2022).
- Gilany, M., Wilson, P., Perera-Ortega, A., Jamzad, A., To, M. N. N., Fooladgar, F., Wodlinger, B., Abolmaesumi, P., and Mousavi, P., “Trusformer: improving prostate cancer detection from micro-ultrasound using attention and self-supervision,” International Journal of Computer Assisted Radiology and Surgery , 1–8 (2023).
- Rohrbach, D., Wodlinger, B., Wen, J., Mamou, J., and Feleppa, E., “High-frequency quantitative ultrasound for imaging prostate cancer using a novel micro-ultrasound scanner,” Ultrasound in medicine & biology 44(7), 1341–1354 (2018).
- Bardes, A., Ponce, J., and LeCun, Y., “Vicreg: Variance-invariance-covariance regularization for self-supervised learning,” arXiv preprint arXiv:2105.04906 (2021).
- Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929 (2020).
- Hassani, A., Walton, S., Shah, N., Abuduweili, A., Li, J., and Shi, H., “Escaping the big data paradigm with compact transformers,” arXiv preprint arXiv:2104.05704 (2021).
- Wang, W., Xie, E., Li, X., Fan, D.-P., Song, K., Liang, D., Lu, T., Luo, P., and Shao, L., “Pyramid vision transformer: A versatile backbone for dense prediction without convolutions,” in [Proceedings of the IEEE/CVF international conference on computer vision ], 568–578 (2021).
- He, K., Zhang, X., Ren, S., and Sun, J., “Deep residual learning for image recognition,” in [Proceedings of the IEEE conference on computer vision and pattern recognition ], 770–778 (2016).
- Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K., “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805 (2018).