A large annotated medical image dataset for the development and evaluation of segmentation algorithms

Published 25 Feb 2019 in cs.CV and eess.IV | (1902.09063v1)

Abstract: Semantic segmentation of medical images aims to associate a pixel with a label in a medical image without human initialization. The success of semantic segmentation algorithms is contingent on the availability of high-quality imaging data with corresponding labels provided by experts. We sought to create a large collection of annotated medical image datasets of various clinically relevant anatomies available under open source license to facilitate the development of semantic segmentation algorithms. Such a resource would allow: 1) objective assessment of general-purpose segmentation methods through comprehensive benchmarking and 2) open and free access to medical image data for any researcher interested in the problem domain. Through a multi-institutional effort, we generated a large, curated dataset representative of several highly variable segmentation tasks that was used in a crowd-sourced challenge - the Medical Segmentation Decathlon held during the 2018 Medical Image Computing and Computer Aided Interventions Conference in Granada, Spain. Here, we describe these ten labeled image datasets so that these data may be effectively reused by the research community.

Abstract PDF Upgrade to Chat

Citations (771)

View on Semantic Scholar

Summary

The paper provides a large annotated dataset to objectively benchmark medical image segmentation algorithms.
It details the composition of 2,633 3D images across ten tasks, covering structures like brain tumors, cardiac anatomy, and liver lesions.
The work highlights the dataset’s role in advancing generalizable segmentation models and its use in community challenges like the MSD.

A Large Annotated Medical Image Dataset for the Development and Evaluation of Segmentation Algorithms

The paper "A large annotated medical image dataset for the development and evaluation of segmentation algorithms" presents a substantial contribution to the advancement of medical image segmentation. This collaborative effort led to the creation of a diverse, annotated dataset encompassing various anatomical structures and modalities, aimed at catalyzing research and development in the field of semantic segmentation algorithms.

Context and Objectives

Medical image segmentation plays a crucial role in clinical practice, enabling precise delineation of anatomical regions of interest, which is fundamental for tasks such as treatment planning and disease assessment. Despite the long-standing interest in segmentation algorithms, their clinical integration remains limited, heavily relying on manual delineation by clinicians. This work aims to bridge the gap by providing a high-quality, thoroughly annotated dataset that encourages the development of general-purpose semantic segmentation algorithms with broad applicability.

Two key objectives underpin this initiative:

Enabling objective assessment of segmentation methods through comprehensive benchmarking.
Democratizing access to medical image data for researchers, thereby promoting innovation.

Methods and Dataset Composition

The dataset comprises 2,633 three-dimensional images across ten unique segmentation tasks, sourced from multiple institutions and modalities, including MRI and CT scans. Key anatomies covered in the dataset include brain tumors, cardiac structures, liver lesions, hippocampus, prostate, lung tumors, pancreas, hepatic vessels, spleen, and colon tumors. Each subset is curated to reflect real-world clinical scenarios, providing a robust foundation for developing adaptable and generalizable segmentation algorithms.

The datasets were meticulously de-identified following institutional review board policies and reformatted to NIfTI, an open format, enhancing usability across the research community. The inclusion of manually refined annotations by domain experts ensures the high quality of segmentation labels, a crucial factor for training and evaluating machine learning models.

Benchmarking and Community Involvement

The Medical Segmentation Decathlon (MSD), held during the 2018 Medical Image Computing and Computer Aided Interventions (MICCAI) Conference, showcased the utility of this dataset. Through this crowd-sourced challenge, participants could test the generalizability and robustness of their segmentation algorithms across a wide array of tasks, highlighting the strengths and limitations of different approaches.

Implications and Future Directions

The provision of this comprehensive annotated dataset has significant theoretical and practical implications. Practically, it serves as a critical resource for developing AutoML-based segmentation systems capable of handling various anatomical structures and imaging modalities without task-specific tuning. Theoretically, it fosters comparative analyses and benchmarking, accelerating the identification of truly novel advancements in segmentation algorithm performance.

Looking forward, the dataset paves the way for future research in several directions:

Development of more sophisticated and generalizable segmentation algorithms using advanced deep learning architectures.
Exploration of transfer learning techniques to further enhance model performance on previously unseen tasks.
Facilitating the integration of segmentation algorithms into clinical workflows by demonstrating improved accuracy and reliability over traditional methods.

In conclusion, the creation and dissemination of this large annotated medical image dataset represent a substantial enhancement to the resources available for medical image segmentation research. By providing detailed annotations across diverse tasks and modalities, this work supports the ongoing evolution of semantic segmentation algorithms destined for clinical application.

Markdown