AbdomenAtlas-8K: Annotating 8,000 CT Volumes for Multi-Organ Segmentation in Three Weeks

Published 16 May 2023 in eess.IV, cs.CV, and cs.LG | (2305.09666v2)

Abstract: Annotating medical images, particularly for organ segmentation, is laborious and time-consuming. For example, annotating an abdominal organ requires an estimated rate of 30-60 minutes per CT volume based on the expertise of an annotator and the size, visibility, and complexity of the organ. Therefore, publicly available datasets for multi-organ segmentation are often limited in data size and organ diversity. This paper proposes an active learning method to expedite the annotation process for organ segmentation and creates the largest multi-organ dataset (by far) with the spleen, liver, kidneys, stomach, gallbladder, pancreas, aorta, and IVC annotated in 8,448 CT volumes, equating to 3.2 million slices. The conventional annotation methods would take an experienced annotator up to 1,600 weeks (or roughly 30.8 years) to complete this task. In contrast, our annotation method has accomplished this task in three weeks (based on an 8-hour workday, five days a week) while maintaining a similar or even better annotation quality. This achievement is attributed to three unique properties of our method: (1) label bias reduction using multiple pre-trained segmentation models, (2) effective error detection in the model predictions, and (3) attention guidance for annotators to make corrections on the most salient errors. Furthermore, we summarize the taxonomy of common errors made by AI algorithms and annotators. This allows for continuous revision of both AI and annotations and significantly reduces the annotation costs required to create large-scale datasets for a wider variety of medical imaging tasks.

Abstract PDF Upgrade to Chat

Citations (21)

View on Semantic Scholar

Summary

The paper demonstrates an active learning framework that annotates 8,448 CT volumes in 3 weeks, reducing traditional annotation time by a factor of 533.
It employs multiple segmentation models and error-detection attention maps to minimize label bias and enhance annotation precision.
The method yields high Dice scores and robust multi-organ segmentation, facilitating improved AI-driven medical diagnosis and clinical care.

An Analysis of "AbdomenAtlas-8K: Annotating 8,000 CT Volumes for Multi-Organ Segmentation in Three Weeks"

The annotation of medical images, particularly multi-organ segmentation in CT volumes, represents a significant challenge in contemporary medical imaging research due to its labor-intensive and time-consuming nature. The paper "AbdomenAtlas-8K" addresses this bottleneck by deploying an efficient active learning procedure that dramatically accelerates the annotation of 8,448 CT volumes for eight abdominal organs within a remarkably short time frame of only three weeks. This method hinges on leveraging machine learning algorithms and human expertise to create what is presently the largest annotated multi-organ dataset, achieving this feat while maintaining high annotation quality.

Active Learning Approach

The paper introduces a novel active learning strategy comprising three pivotal properties: the reduction of label bias via multiple pre-trained segmentation models, error detection in model predictions, and guided attention for annotators focused on salient errors. These components work synergistically to enhance annotation efficiency.

Label Bias Reduction: Instead of relying on a single AI model, the study employs multiple segmentation models—Swin UNETR, nnU-Net, and U-Net—allowing for a more robust detection of variances and reducing bias toward any single model architecture.
Error Detection and Attention Mapping: The study introduces a method for generating attention maps that highlight potential error regions using metrics such as prediction inconsistency, uncertainty, and overlap in predicted organ boundaries. This innovation not only reduces the time radiologists spend on noting and correcting errors but also enhances the precision of manual revisions.
Iterative Annotation and Model Refinement: The active learning process involves iterative steps where AI models are fine-tuned with human-revised labels, progressively improving the model’s predictions and labeling accuracy across subsequent iterations.

Numerical Results and Quality Assurance

According to the study, traditional annotation methods would have required up to 30.8 years to annotate the dataset fully. The proposed method compresses this into three weeks, resulting in an efficiency acceleration by a factor of 533. In terms of label accuracy, the datasets were validated using the JHH dataset, showing high sensitivity and precision rates in identifying prediction errors with the attention maps.

The methodology's impact is evident in the high Dice Similarity Coefficient (DSC) scores observed for multiple organs, indicating that the revised labels, post-active learning, provided superior segmentation quality. Additionally, the ensemble approach to data labeling ensures a vast and diverse dataset, facilitating model generalization across different medical centers' imaging protocols.

Implications and Future Directions

The implications of this methodology are far-reaching, particularly in the burgeoning field of AI-driven medical diagnosis and treatment planning. The AbdomenAtlas-8K dataset provides a foundation for developing more robust AI models that can generalize well to new data and support accurate multi-organ segmentation necessary for clinical workflows, such as radiotherapy planning and surgical navigation.

The success of this active learning framework suggests several future research directions and applications. It opens avenues for extending similar methodologies to other medical imaging domains and modalities, expanding the dataset with additional organ annotations, and integrating this dataset into the development of foundation models tailored for healthcare.

Further exploration into reducing false positives in tumor annotation could also significantly enhance the dataset's utility in other clinical applications, potentially leveraging synthetic data to overcome the limitations of current annotation practices.

In conclusion, "AbdomenAtlas-8K" decisively demonstrates the power of integrating AI with human expertise in medical image annotation, paving the way for more expansive and efficient dataset creation. It serves as a critical resource that promises to enhance the scope and accuracy of medical imaging AI, promoting advancements in medical research and clinical care.

Markdown