Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficient Large Scale Medical Image Dataset Preparation for Machine Learning Applications (2309.17285v1)

Published 29 Sep 2023 in cs.CV

Abstract: In the rapidly evolving field of medical imaging, machine learning algorithms have become indispensable for enhancing diagnostic accuracy. However, the effectiveness of these algorithms is contingent upon the availability and organization of high-quality medical imaging datasets. Traditional Digital Imaging and Communications in Medicine (DICOM) data management systems are inadequate for handling the scale and complexity of data required to be facilitated in machine learning algorithms. This paper introduces an innovative data curation tool, developed as part of the Kaapana open-source toolkit, aimed at streamlining the organization, management, and processing of large-scale medical imaging datasets. The tool is specifically tailored to meet the needs of radiologists and machine learning researchers. It incorporates advanced search, auto-annotation and efficient tagging functionalities for improved data curation. Additionally, the tool facilitates quality control and review, enabling researchers to validate image and segmentation quality in large datasets. It also plays a critical role in uncovering potential biases in datasets by aggregating and visualizing metadata, which is essential for developing robust machine learning models. Furthermore, Kaapana is integrated within the Radiological Cooperative Network (RACOON), a pioneering initiative aimed at creating a comprehensive national infrastructure for the aggregation, transmission, and consolidation of radiological data across all university clinics throughout Germany. A supplementary video showcasing the tool's functionalities can be accessed at https://bit.ly/MICCAI-DEMI2023.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. Medical image analysis using convolutional neural networks: a review. Journal of medical systems, 42:1–13, 2018.
  2. Convolutional neural networks for radiologic images: a radiologist’s guide. Radiology, 290(3):590–606, 2019.
  3. Ai for radiographic covid-19 detection selects shortcuts over signal. Nature Machine Intelligence, 3(7):610–619, 2021.
  4. Introduction to the dicom standard. European radiology, 12:920–927, 2002.
  5. Overview of the dicom standard. In 2008 50th International Symposium ELMAR, volume 1, pages 39–44. IEEE, 2008.
  6. Public covid-19 x-ray datasets and their impact on model bias–a systematic review of a significant problem. Medical image analysis, 74:102225, 2021.
  7. Open health imaging foundation viewer: an extensible open-source framework for building web-based imaging applications to support cancer research. JCO Clinical Cancer Informatics, 4:336–345, 2020.
  8. Preparing medical imaging data for machine learning. Radiology, 295(1):4–15, 2020.
  9. The trials and tribulations of assembling large medical imaging datasets for machine learning applications. Journal of Digital Imaging, 34:1424–1429, 2021.
  10. Data preparation for artificial intelligence in medical imaging: A comprehensive guide to open-access platforms and tools. Physica medica, 83:25–37, 2021.
  11. Automated image curation in diabetic retinopathy screening using deep learning. Scientific Reports, 12(1):11196, 2022.
  12. Totalsegmentator: robust segmentation of 104 anatomical structures in ct images. arXiv preprint arXiv:2208.05868, 2022.
  13. Body part regression with self-supervision. IEEE Transactions on Medical Imaging, 40(5):1499–1507, 2021.
  14. Joint imaging platform for federated clinical data analytics. JCO clinical cancer informatics, 4:1027–1038, 2020.
  15. kaapana/kaapana: v0.2.0, August 2022.
  16. Why rankings of biomedical image analysis competitions should be interpreted with care. Nature communications, 9(1):5217, 2018.
  17. Sarah Schuhegger. Body part regression for ct images. arXiv preprint arXiv:2110.09148, 2021.
  18. nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2):203–211, 2021.
  19. Machine learning with multi-site imaging data: An empirical study on the impact of scanner effects. arXiv preprint arXiv:1910.04597, 2019.
  20. The lung image database consortium (lidc) and image database resource initiative (idri): a completed reference database of lung nodules on ct scans. Medical physics, 38(2):915–931, 2011.
Citations (2)

Summary

We haven't generated a summary for this paper yet.