MedBank-100k: Multi-Task Segmentation Dataset

Updated 26 January 2026

MedBank-100k is a comprehensive dataset offering 122,594 image-mask pairs across 7 medical imaging modalities for multi-task segmentation research.
It employs standardized preprocessing—including frame filtering and aspect-ratio checks—to harmonize heterogeneous annotation formats for robust evaluation.
Benchmarking reveals SAMed-2 achieving superior Dice scores compared to baselines, underscoring the dataset’s value for reproducible, multi-domain segmentation studies.

MedBank-100k is a comprehensive, large-scale medical image segmentation dataset curated for benchmarking and training multi-task foundation models. Comprising 122,594 frame–mask pairs sourced from publicly released datasets, it covers seven principal medical imaging modalities and includes 21 distinct segmentation tasks. Designed to support robust evaluation of segmentation architectures such as SAMed-2, MedBank-100k addresses the complexity of heterogeneous sources and annotation formats through standardized preprocessing, enabling large-scale, multi-domain segmentation research (Yan et al., 4 Jul 2025).

1. Composition and Modalities

MedBank-100k consists of 122,594 images, each paired with a corresponding segmentation mask. The dataset amalgamates data from a diverse range of public medical segmentation corpora, though the number of unique patients and 3D volumes is not specified. The modalities and task distribution are as follows:

Modality	# Tasks	# Images
Fundus	1	559
Dermoscopy	1	2,621
X-Ray	1	23,822
CT	10	34,521
MR	6	19,522
Colonoscopy	1	3,838
Echocardiography	1	1,800
Others	–	35,911

The relative proportions are formally defined as $p_i = N_i / N_\mathrm{total}$ , for each modality $i$ , e.g., $p_\mathrm{CT} \approx 0.2816$ , $p_\mathrm{X-Ray} \approx 0.1944$ , etc.

The dataset covers a broad range of anatomical structures and lesion types, with multi-class masks separated by class during preprocessing, resulting in binarized masks per channel. However, no explicit class counts per task or class distributions are provided.

2. Data Partitioning and Splits

The dataset is divided at the image level into 90% training and 10% test splits:

Training set: $\lfloor 0.9 \times 122,594 \rfloor = 110,334$ images
Test set: $122,594 - 110,334 = 12,260$ images

No separate validation set is mentioned, nor are k-fold cross-validation partitions present. External zero-shot evaluations are performed on 10 distinct datasets, inheriting their respective splits rather than those of MedBank-100k itself.

3. Annotation Sources and Protocols

All segmentation masks originate from existing public datasets (e.g., MS-Decathlon, ISIC, Drishti-GS), with no new manual annotation conducted for MedBank-100k. The paper does not specify annotation guidelines, detailed protocols, or measures of inter-observer agreement such as Cohen’s $\kappa$ . There is no harmonization of annotation standards across tasks and modalities, and no reported quality control procedures.

A plausible implication is that heterogeneity in annotation may introduce variable mask quality and class definitions across modalities and tasks.

4. Preprocessing and Data Standardization

To manage the heterogeneity of input sources, MedBank-100k undergoes four documented preprocessing procedures:

Video data: Drop frames where the segmentation mask sums to zero (i.e., no object labeled).
2D images: Random shuffling, while preserving temporal or volumetric order for sequences or 3D slices.
Aspect-ratio filter: Remove images where the shorter edge is less than half the length of the longer edge, mitigating distortion from resizing.
Multi-class separation: Any mask with multiple classes is split into one binary mask per class channel.

Details concerning intensity normalization protocols, voxel-wise standardization, histogram equalization, or additional noise-handling procedures are not specified beyond reference to “standardized and normalized” images.

The only explicit exclusion criteria are the zero-label rule and aspect-ratio filtering.

5. Benchmarking and Evaluation Framework

MedBank-100k primarily serves as a benchmarking substrate for segmentation architectures, including the SAMed-2 selective memory model. The Dice Similarity Coefficient (DSC) is used as the sole metric for quantitative evaluation across all internal and external benchmarks.

Benchmark results on all 21 internal tasks yield the following average DSC scores:

Model	Avg DSC
SAMed-2	0.7118
MedSAM-2	0.6247
SAM2	0.3954
MedSAM	0.6998
SAM	0.6549
U-Net	0.6221

External zero-shot evaluation on 10 tasks shows:

Model	Avg DSC
SAMed-2	0.6938
MedSAM-2	0.5796
SAM2	0.4375
MedSAM	0.6277
SAM	0.5958
U-Net	0.6879

These results indicate superior multi-task performance for SAMed-2 relative to all compared baselines.

6. Code Availability and Implementation

All scripts and pre-trained checkpoints for MedBank-100k, as well as for training and inference using SAMed-2, are publicly available at https://github.com/ZhilingYan/Medical-SAM-Bench. The repository includes utilities for downloading the dataset, implementing the four-step preprocessing workflow, executing split partitioning, and integrating MedBank-100k into PyTorch pipelines.

A plausible implication is that MedBank-100k enables reproducible research and extensible benchmarking across a range of segmentation models and modalities.

7. Significance and Limitations

MedBank-100k provides a scale and diversity of medical segmentation data well-suited for multi-modal, multi-task learning, addressing major challenges in continual learning and noisy annotation environments. Its assembly facilitates comparison across foundation models and classic CNNs for medical image segmentation.

However, limitations include absent reporting of patient-level statistics, lack of annotation harmonization, missing per-task class counts, and unspecified intensity normalization protocols. These gaps may affect generalizability and consistency but are partly offset by the extensive public codebase and clear benchmarking methodology (Yan et al., 4 Jul 2025).

MedBank-100k thus represents a significant resource for multi-domain segmentation research, with potential for extension and systematic analysis contingent upon future improvements in annotation and statistical reporting.

Markdown Report Issue Upgrade to Chat

References (1)

SAMed-2: Selective Memory Enhanced Medical Segment Anything Model (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MedBank-100k Dataset.

MedBank-100k: Multi-Task Segmentation Dataset

1. Composition and Modalities

2. Data Partitioning and Splits

3. Annotation Sources and Protocols

4. Preprocessing and Data Standardization

5. Benchmarking and Evaluation Framework

6. Code Availability and Implementation

7. Significance and Limitations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

MedBank-100k: Multi-Task Segmentation Dataset

1. Composition and Modalities

2. Data Partitioning and Splits

3. Annotation Sources and Protocols

4. Preprocessing and Data Standardization

5. Benchmarking and Evaluation Framework

6. Code Availability and Implementation

7. Significance and Limitations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research