AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation (2206.08023v3)

Published 16 Jun 2022 in eess.IV, cs.CV, and cs.LG

Abstract: Despite the considerable progress in automatic abdominal multi-organ segmentation from CT/MRI scans in recent years, a comprehensive evaluation of the models' capabilities is hampered by the lack of a large-scale benchmark from diverse clinical scenarios. Constraint by the high cost of collecting and labeling 3D medical data, most of the deep learning models to date are driven by datasets with a limited number of organs of interest or samples, which still limits the power of modern deep models and makes it difficult to provide a fully comprehensive and fair estimate of various methods. To mitigate the limitations, we present AMOS, a large-scale, diverse, clinical dataset for abdominal organ segmentation. AMOS provides 500 CT and 100 MRI scans collected from multi-center, multi-vendor, multi-modality, multi-phase, multi-disease patients, each with voxel-level annotations of 15 abdominal organs, providing challenging examples and test-bed for studying robust segmentation algorithms under diverse targets and scenarios. We further benchmark several state-of-the-art medical segmentation models to evaluate the status of the existing methods on this new challenging dataset. We have made our datasets, benchmark servers, and baselines publicly available, and hope to inspire future research. Information can be found at https://amos22.grand-challenge.org.

Authors (11)

Yuanfeng Ji (20 papers)
Haotian Bai (10 papers)
Jie Yang (516 papers)
Chongjian Ge (23 papers)
Ye Zhu (75 papers)
Ruimao Zhang (84 papers)
Zhen Li (334 papers)
Lingyan Zhang (6 papers)
Wanling Ma (1 paper)
Xiang Wan (94 papers)
Ping Luo (340 papers)

Citations (216)

View on Semantic Scholar

Summary

An Overview of AMOS: A Large-Scale Abdominal Multi-Organ Benchmark

The paper introduces AMOS, an extensive and multi-faceted abdominal multi-organ benchmark dataset designed to advance research in medical image segmentation. This work is positioned as a significant contribution to the field, primarily addressing the limitations in existing databases concerning scale, diversity, and the clinical representativeness of abdominal imaging datasets. AMOS is composed of 600 scans, including 500 CT and 100 MRI datasets, each with voxel-level annotations for 15 abdominal organs. The paper asserts this dataset as the largest and most diverse of its kind, offering a comprehensive resource for benchmarking segmentation algorithms across various medical imaging modalities.

Key Contributions of AMOS

Scale and Diversity: AMOS is designed to overcome the typical constraints of previous segmentation datasets, which often lack either the volume of data or diversity. With over 74,000 annotated slices, AMOS is significantly larger than existing benchmarks like BTCV, which offers only 50 CT scans. The dataset includes scans from multiple scanners and centers, incorporating patients with various abdominal diseases, thereby simulating real-world clinical conditions more accurately than single-center datasets.
Clinical Representativeness: By sourcing data from actual clinical settings with diverse imaging protocols and disease representations, AMOS aims to provide a robust test-bed for evaluating algorithm performance against the variability encountered in practice. This is crucial for developing models capable of generalizing across different imaging circumstances.
Benchmarking and Evaluation: The authors have included extensive benchmarking of state-of-the-art segmentation models on the AMOS dataset. Models like UNet and nnFormer were evaluated, showing that existing algorithms struggle to deliver satisfactory performance, particularly on smaller organs such as the adrenal glands and duodenum. This highlights the dataset's challenge and suggests a need for more advanced algorithms to handle the complexity in AMOS.
Multi-purpose Usability: Beyond segmentation, AMOS is positioned as a versatile dataset suitable for explorations in Out-of-Distribution (OOD) generalization, cross-modality learning, and transfer learning. The dataset's structure provides a fertile ground for studying generalization across modalities like CT and MRI, which offers significant advantages in developing robust, clinically useful AI models.

Implications and Future Directions

The release of AMOS sets a new standard for abdominal organ segmentation datasets, emphasizing the importance of size and diversity in developing clinical-grade AI models. Its scale facilitates more robust training of deep learning models, which is crucial for capturing the variability in organ appearance across different patients and imaging conditions.

The diversity and comprehensive nature of AMOS imply that it could significantly impact the trajectory of research within medical imaging. The dataset has potential uses in not only testing segmentation algorithms but also in improving transfer learning techniques and cross-modality model robustness. This can lead to more efficient training paradigms and models that generalize well across unseen domains, which is a crucial consideration for medical AI systems intended for real-world deployment.

Furthermore, the benchmark’s design enables detailed evaluations of algorithms on a range of tasks, from segmentation accuracy to boundary precision. It encourages the development of methods that go beyond mere pixel accuracy to improve the overall clinical applicability of AI solutions, aiming for fine-grained, precise, and reliable segmentation outcomes.

Conclusion

In summary, the AMOS dataset represents a substantial progression in resources available for medical image analysis research, particularly in the context of abdominal organ segmentation. By offering a large-scale, diverse, and clinically relevant dataset, AMOS provides a robust foundation for developing and benchmarking advanced segmentation algorithms. This work promises to catalyze further innovations in medical image computing, supporting efforts to transition AI technologies from research settings into clinical practice effectively. The provision of such a dataset is a pivotal enabler of the broad field-scale studies that are crucial for the maturation of this technology.

PDF Markdown

Related Papers

Find Related Papers