Coupling AI and Citizen Science in Creation of Enhanced Training Dataset for Medical Image Segmentation (2409.03087v1)
Abstract: Recent advancements in medical imaging and AI have greatly enhanced diagnostic capabilities, but the development of effective deep learning (DL) models is still constrained by the lack of high-quality annotated datasets. The traditional manual annotation process by medical experts is time- and resource-intensive, limiting the scalability of these datasets. In this work, we introduce a robust and versatile framework that combines AI and crowdsourcing to improve both the quality and quantity of medical image datasets across different modalities. Our approach utilises a user-friendly online platform that enables a diverse group of crowd annotators to label medical images efficiently. By integrating the MedSAM segmentation AI with this platform, we accelerate the annotation process while maintaining expert-level quality through an algorithm that merges crowd-labelled images. Additionally, we employ pix2pixGAN, a generative AI model, to expand the training dataset with synthetic images that capture realistic morphological features. These methods are combined into a cohesive framework designed to produce an enhanced dataset, which can serve as a universal pre-processing pipeline to boost the training of any medical deep learning segmentation model. Our results demonstrate that this framework significantly improves model performance, especially when training data is limited.
- The role of generative adversarial networks in brain mri: a scoping review. Insights into imaging, 13(1):98, 2022.
- A survey on deep learning tools dealing with data scarcity: definitions, challenges, solutions, tips, and applications. Journal of Big Data, 10(1):46, 2023.
- Generative adversarial networks: An overview. IEEE signal processing magazine, 35(1):53–65, 2018.
- Huggin Face Enterprise. Huggin face spaces, 2024. Available at: https://huggingface.co/; Accessed on 17th January 2024.
- Gan-based synthetic medical image augmentation for increased cnn performance in liver lesion classification. Neurocomputing, 321:321–331, 2018.
- Image annotation and curation in radiology: an overview for machine learning practitioners. European Radiology Experimental, 8(1):11, 2024.
- Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
- Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.
- Guest editorial deep learning in medical imaging: Overview and future promise of an exciting new technique. IEEE transactions on medical imaging, 35(5):1153–1159, 2016.
- How to collect segmentations for biomedical images? a benchmark evaluating the performance of experts, crowdsourced non-experts, and algorithms. In 2015 IEEE winter conference on applications of computer vision, pages 1169–1176. IEEE, 2015.
- Large-scale medical image annotation with crowd-powered algorithms. Journal of Medical Imaging, 5(3):034002–034002, 2018.
- Statistical shape models for 3d medical image segmentation: a review. Medical image analysis, 13(4):543–563, 2009.
- Amazon Mechanical Turk Inc. Amazon mechanical turk (mturk), 2024. Available at: https://www.mturk.com/; Accessed on 7th April 2024.
- Google Inc. Duolingo fraud detection, 2024. Available at: https://www.duolingo.com/; Accessed on 7th April 2024.
- Human Signal Inc. Open source data labelling platform, 2023. Available at: https://labelstud.io/; Accessed on 3rd November 2023.
- National Cancer Institute. Cip cancer imaging program, cancer imaging archive, 2015. Available at: https://www.cancerimagingarchive.net/; Accessed on 27th December 2023.
- Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1125–1134, 2017.
- Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4015–4026, 2023.
- Google LLC. Google recaptcha, 2024. Available at: https://www.google.com/recaptcha/about/; Accessed on 7th April 2024.
- Appen ltd. Appen figure eight, 2024. Available at: https://www.appen.com/ai-data; Accessed on 7th April 2024.
- Jun Ma. Miccai flare22 challenge dataset (50 labeled abdomen ct scans), 2022. Available at: https://zenodo.org/records/7860267; Accessed on 10th December 2023.
- Segment anything in medical images. Nature Communications, 15(1):654, 2024.
- Can masses of non-experts train highly accurate image classifiers? a crowdsourcing approach to instrument segmentation in laparoscopic images. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2014: 17th International Conference, Boston, MA, USA, September 14-18, 2014, Proceedings, Part II 17, pages 438–445. Springer, 2014.
- Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014.
- Learning data augmentation for brain tumor segmentation with coarse-to-fine generative adversarial networks. In Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 4th International Workshop, BrainLes 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16, 2018, Revised Selected Papers, Part I 4, pages 70–80. Springer, 2019.
- Generative ai for medical imaging analysis and applications. Future Medicine AI, 1(0):FMAI5, 2023.
- Crowdsourcing human-based computation for medical image analysis: A systematic literature review. Health Informatics Journal, 26(4):2446–2469, 2020.
- Learning from crowds. Journal of machine learning research, 11(4), 2010.
- Iqbal H Sarker. Deep learning: a comprehensive overview on techniques, taxonomy, applications and research directions. SN Computer Science, 2(6):420, 2021.
- A survey on image data augmentation for deep learning. Journal of Big Data, 6(1):60, Jul 2019.
- Thorvald Sorensen. A method of establishing groups of equal amplitude in plant sociology based on similarity of species content and its application to analyses of the vegetation on danish commons. Biologiske skrifter, 5:1–34, 1948.
- Imaging research in fibrotic lung disease; applying deep learning to unsolved problems. The Lancet Respiratory Medicine, 8(11):1144–1153, 2020.
- Addressing challenges in radiomics research: systematic review and repository of open-access cancer imaging datasets. Insights into Imaging, 14(1):216, 2023.
- When ai eats itself: On the caveats of data pollution in the era of generative ai, 2024.
- Xiahai Zhuang. Mm-whs: Multi-modality whole heart segmentation, 2019. Available at: https://zmiclab.github.io/zxh/0/mmwhs/; Accessed on 14th September 2023.
- Evaluation of algorithms for multi-modality whole heart segmentation: an open-access grand challenge. Medical image analysis, 58:101537, 2019.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Collections
Sign up for free to add this paper to one or more collections.