Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CellViT: Vision Transformers for Precise Cell Segmentation and Classification (2306.15350v2)

Published 27 Jun 2023 in eess.IV, cs.CV, and cs.LG

Abstract: Nuclei detection and segmentation in hematoxylin and eosin-stained (H&E) tissue images are important clinical tasks and crucial for a wide range of applications. However, it is a challenging task due to nuclei variances in staining and size, overlapping boundaries, and nuclei clustering. While convolutional neural networks have been extensively used for this task, we explore the potential of Transformer-based networks in this domain. Therefore, we introduce a new method for automated instance segmentation of cell nuclei in digitized tissue samples using a deep learning architecture based on Vision Transformer called CellViT. CellViT is trained and evaluated on the PanNuke dataset, which is one of the most challenging nuclei instance segmentation datasets, consisting of nearly 200,000 annotated Nuclei into 5 clinically important classes in 19 tissue types. We demonstrate the superiority of large-scale in-domain and out-of-domain pre-trained Vision Transformers by leveraging the recently published Segment Anything Model and a ViT-encoder pre-trained on 104 million histological image patches - achieving state-of-the-art nuclei detection and instance segmentation performance on the PanNuke dataset with a mean panoptic quality of 0.50 and an F1-detection score of 0.83. The code is publicly available at https://github.com/TIO-IKIM/CellViT

Definition Search Book Streamline Icon: https://streamlinehq.com
References (73)
  1. The global burden of cancer attributable to risk factors, 2010–19: a systematic analysis for the global burden of disease study 2019. The Lancet, 400(10352):563–591, August 2022. doi: 10.1016/s0140-6736(22)01438-6.
  2. Clinical significance of tumor-infiltrating lymphocytes in breast cancer. Journal for ImmunoTherapy of Cancer, 4(1), October 2016. doi: 10.1186/s40425-016-0165-6.
  3. Inflammation and cancer: Triggers, mechanisms, and consequences. Immunity, 51(1):27–41, July 2019. doi: 10.1016/j.immuni.2019.06.025.
  4. Spatially confined sub-tumor microenvironments in pancreatic cancer. Cell, 184(22):5577–5592.e18, October 2021. doi: 10.1016/j.cell.2021.09.022.
  5. Valuing vicinity: Memory attention framework for context-based semantic segmentation in histopathology. Computerized Medical Imaging and Graphics, 107:102238, July 2023. ISSN 08956111. doi: 10.1016/j.compmedimag.2023.102238.
  6. Data-efficient and weakly supervised computational pathology on whole-slide images. Nature Biomedical Engineering, 5(6):555–570, March 2021. doi: 10.1038/s41551-020-00682-w.
  7. Histology-based prediction of therapy response to neoadjuvant chemotherapy for esophageal and esophagogastric junction adenocarcinomas using deep learning. JCO Clinical Cancer Informatics, 2023. Forthcoming.
  8. Hover-net: Simultaneous segmentation and classification of nuclei in multi-tissue histology images. Medical Image Analysis, 58:101563, 2019. ISSN 1361-8415. doi: 10.1016/j.media.2019.101563.
  9. TSFD-Net: Tissue specific feature distillation network for nuclei segmentation and classification. Neural Networks, 151:1–15, July 2022. ISSN 0893-6080. doi: 10.1016/j.neunet.2022.02.020.
  10. One model is all you need: Multi-task learning enables simultaneous histology image segmentation and classification. Medical Image Analysis, 83:102685, 2023. ISSN 1361-8415. doi: 10.1016/j.media.2022.102685.
  11. Novel digital signatures of tissue phenotypes for predicting distant metastasis in colorectal cancer. Scientific Reports, 8(1), September 2018. doi: 10.1038/s41598-018-31799-3.
  12. Training a cell-level classifier for detecting basal-cell carcinoma by combining human visual attention maps with low-level handcrafted features. Journal of Medical Imaging, 4(2):021105, March 2017. doi: 10.1117/1.jmi.4.2.021105.
  13. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nature Medicine, 25(8):1301–1309, July 2019. doi: 10.1038/s41591-019-0508-1.
  14. CoNIC: Colon nuclei identification and counting challenge 2022. arXiv Preprint, November 2021. doi: 10.48550/arXiv.2111.14485.
  15. Extraction of informative cell features by segmentation of densely clustered tissue images. In 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE, September 2009. doi: 10.1109/iembs.2009.5333810.
  16. Wie funktioniert radiomics? Der Radiologe, 60(1):32–41, December 2019. doi: 10.1007/s00117-019-00617-w.
  17. PanNuke dataset extension, insights and baselines. arXiv Preprint, April 2020. doi: 10.48550/arXiv.2003.10778.
  18. Scaling vision transformers to gigapixel images via hierarchical self-supervised learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16144–16155, June 2022. doi: 10.1109/CVPR52688.2022.01567.
  19. Segment anything. arXiv Preprint, April 2023. doi: 10.48550/arXiv.2304.02643.
  20. UNETR: Transformers for 3D medical image segmentation. arXiv Preprint, October 2021. doi: 10.48550/arXiv.2103.10504.
  21. Nuclei segmentation using marker-controlled watershed, tracking using mean-shift, and kalman filter in time-lapse microscopy. IEEE Transactions on Circuits and Systems I: Regular Papers, 53(11):2405–2414, 2006. doi: 10.1109/TCSI.2006.884469.
  22. Applying watershed algorithms to the segmentation of clustered nuclei. Cytometry, 28(4):289–297, December 1998. doi: 10.1002/(sici)1097-0320(19970801)28:4<289::aid-cyto3>3.0.co;2-7.
  23. Multi-pass fast watershed for accurate segmentation of overlapping cervical cells. IEEE Transactions on Medical Imaging, 37(9):2044–2059, 2018. doi: 10.1109/TMI.2018.2815013.
  24. J. Cheng and J. C. Rajapakse. Segmentation of clustered nuclei with shape markers and marking function. IEEE Transactions on Biomedical Engineering, 56(3):741–748, 2009. doi: 10.1109/TBME.2008.2008635.
  25. Automatic nuclei segmentation in h&e stained breast cancer histopathology images. PLOS ONE, 8(7):null, 07 2013. doi: 10.1371/journal.pone.0070221.
  26. S. Ali and A. Madabhushi. An integrated region-, boundary-, shape-based active contour for multiple object overlap resolution in histological imagery. IEEE Transactions on Medical Imaging, 31(7):1448–1460, 2012. doi: 10.1109/TMI.2012.2190089.
  27. Detection and segmentation of cell nuclei in virtual microscopy images: A minimum-model approach. Scientific Reports, 2(1), July 2012. doi: 10.1038/srep00503.
  28. Automatic segmentation for cell images based on bottleneck detection and ellipse fitting. Neurocomputing, 173:615–622, January 2016. doi: 10.1016/j.neucom.2015.08.006.
  29. CPP-Net: Context-aware polygon proposal network for nucleus segmentation. IEEE Transactions on Image Processing, 32:980–994, 2023. ISSN 1941-0042. doi: 10.1109/TIP.2023.3237013.
  30. Image segmentation using deep learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(7):3523–3542, 2022. doi: 10.1109/TPAMI.2021.3059968.
  31. A guide to deep learning in healthcare. Nature Medicine, 25(1):24–29, January 2019. doi: 10.1038/s41591-018-0316-z.
  32. Deep learning. Nature, 521(7553):436–444, May 2015. doi: 10.1038/nature14539.
  33. U-net: Convolutional networks for biomedical image segmentation. In Lecture Notes in Computer Science, pages 234–241. Springer International Publishing, 2015. doi: 10.1007/978-3-319-24574-4_28.
  34. nnU-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature Methods, 18(2):203–211, December 2020. doi: 10.1038/s41592-020-01008-z.
  35. Radiology artificial intelligence: a systematic review and evaluation of methods (RAISE). European Radiology, 32(11):7998–8007, April 2022. doi: 10.1007/s00330-022-08784-6.
  36. U-net and its variants for medical image segmentation: A review of theory and applications. IEEE Access, 9:82031–82057, 2021. doi: 10.1109/ACCESS.2021.3086020.
  37. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Oct 2017.
  38. R. Girshick. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), December 2015. doi: 10.1109/ICCV.2017.322.
  39. Nuclear instance segmentation using a proposal-free spatially aware deep learning framework. In Lecture Notes in Computer Science, pages 622–630. Springer International Publishing, 2019. doi: 10.1007/978-3-030-32239-7_69.
  40. Accurate cervical cell segmentation from overlapping clumps in pap smear images. IEEE Transactions on Medical Imaging, 36(1):288–300, 2017. doi: 10.1109/TMI.2016.2606380.
  41. Micro-net: A unified model for segmentation of various objects in microscopy images. Medical Image Analysis, 52:160–173, February 2019. doi: 10.1016/j.media.2018.12.003.
  42. Segmentation of nuclei in histopathology images by deep regression of the distance map. IEEE Transactions on Medical Imaging, 38(2):448–459, 2019. doi: 10.1109/TMI.2018.2865709.
  43. M. Weigert and U. Schmidt. Nuclei Instance Segmentation and Classification in Histopathology Images with Stardist. In 2022 IEEE International Symposium on Biomedical Imaging Challenges (ISBIC), pages 1–4, March 2022. doi: 10.1109/ISBIC56247.2022.9854534.
  44. Cell detection with star-convex polygons. In Medical Image Computing and Computer Assisted Intervention – MICCAI 2018, pages 265–273. Springer International Publishing, 2018. doi: 10.1007/978-3-030-00934-2_30.
  45. Dcan: Deep contour-aware networks for accurate gland segmentation. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2487–2496. IEEE Computer Society, jun 2016. doi: 10.1109/CVPR.2016.273.
  46. Feature pyramid networks for object detection. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 936–944. IEEE Computer Society, jul 2017a. doi: 10.1109/CVPR.2017.106.
  47. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017b. doi: 10.1109/tpami.2018.2858826.
  48. A. Nabila and N. M. Khan. A novel focal tversky loss function with improved attention u-net for lesion segmentation. In 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), pages 683–687, 2019. doi: 10.1109/ISBI.2019.8759329.
  49. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  50. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv Preprint, June 2021. doi: 10.48550/arXiv.2010.11929.
  51. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9650–9660, 2021. doi: 10.1109/ICCV48922.2021.00951.
  52. Do vision transformers see like convolutional neural networks? Advances in Neural Information Processing Systems, 34:12116–12128, 2021.
  53. Vit-yolo:transformer-based yolo for object detection. In 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pages 2799–2808, 2021. doi: 10.1109/ICCVW54120.2021.00314.
  54. L. Y. Chen and Q. Yu. Transformers make strong encoders for medical image segmentation. arXiv, February 2021. doi: 10.48550/arXiv.2102.04306.
  55. Medical image segmentation using squeeze-and-expansion transformers. arXiv, May 2021. doi: 10.48550/arXiv.2105.09511.
  56. Swin UNETR: Swin transformers for semantic segmentation of brain tumors in MRI images. In Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, pages 272–284. Springer International Publishing, 2022. doi: 10.1007/978-3-031-08999-2_22.
  57. Segformer: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems, 34:12077–12090, 2021.
  58. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6881–6890, 2021. doi: 10.1109/CVPR46437.2021.00681.
  59. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
  60. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020. doi: 10.1109/CVPR42600.2020.00975.
  61. Unsupervised learning of visual features by contrasting cluster assignments. Advances in neural information processing systems, 33:9912–9924, 2020.
  62. Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33:21271–21284, 2020.
  63. X. Chen and K. He. Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15750–15758, 2021. doi: 10.1109/CVPR46437.2021.01549.
  64. On the opportunities and risks of foundation models. arXiv, August 2021. doi: 10.48550/arXiv.2108.07258.
  65. QuPath: Open source software for digital pathology image analysis. Scientific Reports, 7(1), December 2017. doi: 10.1038/s41598-017-17204-5.
  66. A multi-organ nucleus segmentation challenge. IEEE Transactions on Medical Imaging, 39(5):1380–1391, 2020. doi: 10.1109/TMI.2019.2947628.
  67. A dataset and a technique for generalized nuclear segmentation for computational pathology. IEEE transactions on medical imaging, 36(7):1550–1560, 2017. doi: 10.1109/TMI.2017.2677499.
  68. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  69. Panoptic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019. doi: 10.1109/CVPR.2019.00963.
  70. Locality sensitive deep learning for detection and classification of nuclei in routine colon cancer histology images. IEEE Transactions on Medical Imaging, 35(5):1196–1206, 2016. doi: 10.1109/TMI.2016.2525803.
  71. Albumentations: fast and flexible image augmentations. Information, 11(2):125, 2020.
  72. Okunator. okunator/cellseg_models.pytorch: v0.1.23, 2022.
  73. I. Loshchilov and F. Hutter. Decoupled weight decay regularization. arXiv, November 2017. doi: 10.48550/arXiv.1711.05101.
Citations (52)

Summary

  • The paper introduces CellViT, a novel deep learning model that leverages Vision Transformers and large-scale pre-training for precise cell segmentation and classification in H&E-stained samples.
  • It employs a modified UNETR architecture with integrated skip connections and pre-trained ViT encoders to handle challenges like overlapping nuclei and variable staining.
  • Evaluated on PanNuke and MoNuSeg datasets, CellViT achieves notable performance improvements, with a mean panoptic quality of 0.50 and an F1-detection score of 0.83, enhancing clinical inference efficiency.

Analysis of CellViT: Vision Transformers for Precise Cell Segmentation and Classification

The paper "CellViT: Vision Transformers for Precise Cell Segmentation and Classification" tackles a significant challenge in the domain of digital pathology: the automatic and precise segmentation and classification of cell nuclei in hematoxylin and eosin (H&E)-stained tissue samples. This task is pivotal, as it supports extensive analyses in cancer diagnosis and research.

Leveraging the benefits of Vision Transformer (ViT) models, the paper presents CellViT—an innovative deep learning architecture that enhances the performance of nuclei segmentation tasks. The model is pre-trained on the PanNuke dataset, noted for its complexity due to diverse cell types, inconsistent staining, and challenging nuclei clustering scenarios. The primary contributions of this paper lie in the integration of large-scale pre-training with ViTs, adopting the Segment Anything Model (SAM) for preliminary segmentation tasks and effectively utilizing a ViT encoder pre-trained on a vast histological image dataset.

Core Contributions and Methodology

  1. Architecture Design: CellViT employs a variant of the UNETR architecture modified for two-dimensional histological images. It maintains the U-Net's beneficial skip connections while describing a novel encoder-decoder network structure. This design allows the model to retain high-resolution spatial information conducive to precise nuclei segmentation.
  2. Pre-training and Transfer Learning: The paper convincingly demonstrates the superiority of employing pre-trained ViTs over non-pre-trained alternatives. By adopting pre-trained ViTs like SAM and a comprehensive ViT model without any architectural modifications, the authors underscore the impact of transfer learning in histology-specific contexts.
  3. Instance Segmentation Challenges: Overlapping nuclei boundaries and intra-instance variability present considerable obstacles in medical image analysis. CellViT's utilization of ViTs capitalizes on their capability to capture long-range dependencies within images, enhancing clustering and segmentation accuracy across these variants.
  4. Performance and Generalization: Evaluated on PanNuke and MoNuSeg datasets, CellViT outperforms existing state-of-the-art methods such as HoVer-Net and Micro-Net in both detection and segmentation accuracy. The quantitative advancements are evident with the reported mean panoptic quality of 0.50 and an F1F_1-detection score of 0.83 on PanNuke.
  5. Inference Efficiency: The use of larger input patches during inference significantly decreases runtime, improving computational efficiency in processing gigapixel whole-slide-images (WSI). This advancement is crucial for practical deployment in clinical settings, providing timely outputs.

Implications and Future Directions

The findings have profound implications for the field of computational pathology. The enhancements in detection and classification accuracies foster more reliable automated diagnosis processes. The CellViT framework potentially sets a precedent for developing end-to-end interpretable models capable of integrating cell-level features with clinical insights, thereby enriching the computational pathology pipeline.

Future work could explore the application of CellViT's extracted nuclei embeddings in downstream tasks such as survival prediction or tissue-level disease classification. Additionally, the thematic of using localizable embeddings for predictive insights on histological images suggests intriguing possibilities for feature-driven, data-rich analytical approaches in pathology.

In conclusion, CellViT exemplifies a notable stride in the application of Vision Transformers to medical image analysis, particularly for instances demanding precise segmentation and classification. The paper presents a compelling case for the continuation and expansion of Transformer-based approaches in the domain, fostering improved accuracy and efficiency in digital pathology applications.

Github Logo Streamline Icon: https://streamlinehq.com