Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FocusMAE: Gallbladder Cancer Detection from Ultrasound Videos with Focused Masked Autoencoders (2403.08848v2)

Published 13 Mar 2024 in eess.IV and cs.CV

Abstract: In recent years, automated Gallbladder Cancer (GBC) detection has gained the attention of researchers. Current state-of-the-art (SOTA) methodologies relying on ultrasound sonography (US) images exhibit limited generalization, emphasizing the need for transformative approaches. We observe that individual US frames may lack sufficient information to capture disease manifestation. This study advocates for a paradigm shift towards video-based GBC detection, leveraging the inherent advantages of spatiotemporal representations. Employing the Masked Autoencoder (MAE) for representation learning, we address shortcomings in conventional image-based methods. We propose a novel design called FocusMAE to systematically bias the selection of masking tokens from high-information regions, fostering a more refined representation of malignancy. Additionally, we contribute the most extensive US video dataset for GBC detection. We also note that, this is the first study on US video-based GBC detection. We validate the proposed methods on the curated dataset, and report a new state-of-the-art (SOTA) accuracy of 96.4% for the GBC detection problem, against an accuracy of 84% by current Image-based SOTA - GBCNet, and RadFormer, and 94.7% by Video-based SOTA - AdaMAE. We further demonstrate the generality of the proposed FocusMAE on a public CT-based Covid detection dataset, reporting an improvement in accuracy by 3.3% over current baselines. The source code and pretrained models are available at: https://gbc-iitd.github.io/focusmae

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. Covid-ct-md, covid-19 computed tomography scan dataset applicable in machine learning and deep learning. Scientific Data, 8(1):121, 2021.
  2. Towards a better understanding of transfer learning for medical imaging: a case study. Applied Sciences, 10(13):4523, 2020.
  3. Vivit: A video vision transformer. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6836–6846, 2021.
  4. Adamae: Adaptive masking for efficient spatiotemporal learning with masked autoencoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14507–14517, 2023.
  5. Surpassing the human accuracy: Detecting gallbladder cancer from usg images with curriculum learning. In CVPR, pages 20886–20896, 2022.
  6. Radformer: Transformers with global–local attention for interpretable and accurate gallbladder cancer detection. Medical Image Analysis, 83:102676, 2023.
  7. Gall bladder cancer detection from us images with only image level labels. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 206–215. Springer, 2023.
  8. Unsupervised contrastive learning of image representations from ultrasound videos with hard negative mining. In MICCAI, pages 423–433. Springer, 2022.
  9. Is space-time attention all you need for video understanding? In ICML, volume 2, page 4, 2021.
  10. Ct manifestations of gallbladder carcinoma based on neural network. Neural Computing and Applications, pages 1–6, 2022.
  11. Computer-aided diagnosis of gallbladder polyps based on high resolution ultrasonography. Computer methods and programs in biomedicine, 185:105118, 2020.
  12. Transfer learning with convolutional neural networks for classification of abdominal ultrasound images. Journal of digital imaging, 30(2):234–243, 2017.
  13. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  14. Masked autoencoders as spatiotemporal learners. Advances in neural information processing systems, 35:35946–35958, 2022.
  15. Omnimae: Single model masked pretraining on images and videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10406–10417, 2023.
  16. The” something something” video database for learning and evaluating visual common sense. In Proceedings of the IEEE international conference on computer vision, pages 5842–5850, 2017.
  17. Maskvit: Masked visual pre-training for video prediction. arXiv preprint arXiv:2206.11894, 2022.
  18. How reliable are the metrics used for assessing reliability in medical imaging? In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 149–158. Springer, 2023.
  19. Applications of artificial intelligence in biliary tract cancers. Indian Journal of Gastroenterology, pages 1–12, 2024.
  20. Deep-learning enabled ultrasound based detection of gallbladder cancer in northern india: a prospective diagnostic study. The Lancet Regional Health-Southeast Asia, 2023.
  21. Deep-learning models for differentiation of xanthogranulomatous cholecystitis and gallbladder cancer on ultrasound. Indian Journal of Gastroenterology, pages 1–8, 2023.
  22. Gallbladder reporting and data system (gb-rads) for risk stratification of gallbladder wall thickening on ultrasonography: an international expert consensus. Abdominal Radiology, pages 1–12, 2021.
  23. Imaging-based algorithmic approach to gallbladder wall thickening. World journal of gastroenterology, 26(40):6163, 2020.
  24. Locally advanced gallbladder cancer: a review of the criteria and role of imaging. Abdominal Radiology, 46(3):998–1007, 2021.
  25. Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
  26. Surgical outcome and prognostic factors in patients with gallbladder carcinoma. Annals of Hepato-Biliary-Pancreatic Surgery, 18(4):129–137, 2014.
  27. Seer cancer statistics review, 1975-2014, national cancer institute. Bethesda, MD, pages 1–12, 2017.
  28. Mgmae: Motion guided masking for video masked autoencoding. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13493–13504, 2023.
  29. Diagnostic performance of endoscopic ultrasound-artificial intelligence using deep learning analysis of gallbladder polypoid lesions. Journal of Gastroenterology and Hepatology, 36(12):3548–3555, 2021.
  30. Deep learning-based decision support system for the diagnosis of neoplastic gallbladder polyps on ultrasonography: preliminary results. Scientific Reports, 10(1):1–10, 2020.
  31. Deep learning model based on contrast-enhanced computed tomography imaging to predict postoperative early recurrence after the curative resection of a solitary hepatocellular carcinoma. Cancers, 15(7):2140, 2023.
  32. Semmae: Semantic-guided masking for learning masked autoencoders. arXiv preprint arXiv:2206.10207, 2022.
  33. Vidtr: Video transformer without convolutions. arXiv e-prints, pages arXiv–2104, 2021.
  34. Video swin transformer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3202–3211, 2022.
  35. A novel yolov3-arch model for identifying cholelithiasis and classifying gallstones on ct images. PloS one, 14(6):e0217647, 2019.
  36. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  37. Keeping your eye on the ball: Trajectory attention in video transformers. Advances in neural information processing systems, 34:12493–12506, 2021.
  38. Spatiotemporal contrastive video representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6964–6974, 2021.
  39. Mar: Masked autoencoders for efficient action recognition. IEEE Transactions on Multimedia, 2023.
  40. Gallbladder cancer worldwide: geographical distribution and risk factors. International journal of cancer, 118(7):1591–1602, 2006.
  41. Faster r-cnn: Towards real-time object detection with region proposal networks. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 91–99. Curran Associates, Inc., 2015.
  42. Detecting and classifying lesions in mammograms with deep learning. Scientific reports, 8(1):1–7, 2018.
  43. Masked motion encoding for self-supervised video representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2235–2245, June 2023.
  44. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826, 2016.
  45. Efficientdet: Scalable and efficient object detection. In Proc. IEEE CVPR, pages 10781–10790, 2020.
  46. Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. Advances in neural information processing systems, 35:10078–10093, 2022.
  47. Training data-efficient image transformers & distillation through attention. In ICML, pages 10347–10357. PMLR, 2021.
  48. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  49. Videomae v2: Scaling video masked autoencoders with dual masking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14549–14560, 2023.
  50. Pvtv2: Improved baselines with pyramid vision transformer, 2021.
  51. Masked feature prediction for self-supervised visual pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14668–14678, 2022.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com