Papers
Topics
Authors
Recent
2000 character limit reached

Mycetoma Database (MyData)

Updated 1 January 2026
  • Mycetoma Database (MyData) is an open-access, expertly curated histopathology image resource facilitating automated detection, segmentation, and species-level classification of mycetoma.
  • It comprises 864 images from 142 patients with detailed annotations, metadata, and standardized acquisition protocols for benchmarking AI-based diagnostics.
  • Controlled imaging protocols, precise grain-level segmentation, and ethical compliance support reproducible research and clinical translation in resource-limited settings.

Mycetoma Database (MyData) is an open-access, expertly curated histopathology image resource designed to advance automated detection, segmentation, and classification of mycetoma—an inflammatory, granulomatous disease caused by fungi (eumycetoma) or bacteria (actinomycetoma). Accurate species identification is essential for clinical management, and histopathology provides the most practical diagnostic approach in endemic, resource-constrained environments. MyData comprises systematically acquired and annotated microscopic images with exhaustive acquisition metadata and carefully controlled labeling protocols, supporting reproducible research in medical image analysis and digital pathology (Ali et al., 2024, Ali et al., 25 Dec 2025).

1. Origin, Objectives, and Scope

MyData was established to overcome challenges in the diagnosis of mycetoma in regions where expertise in histopathological identification is limited. The database captures the epidemiological spectrum of mycetoma in Sudan’s “mycetoma belt” by sourcing tissue from patients clinically and microbiologically confirmed with the disease. Its primary objectives include:

  • Providing the first open repository of histopathological images specifically for mycetoma tissue, facilitating the development and benchmarking of automated image-analysis algorithms.
  • Enabling grain-level segmentation and pathogen classification tasks by delivering images annotated with binary masks for grain detection.
  • Supplying full acquisition protocols to ensure experimental reproducibility and facilitate clinical translation.
  • Supporting AI-based diagnostic enhancement to mitigate dependence on scarce expert pathologists, ultimately impacting clinical outcomes in endemic regions (Ali et al., 2024).

2. Dataset Composition and Class Distribution

The MyData corpus comprises 864 light-microscopy RGB images from 142 patients (80 eumycetoma, 62 actinomycetoma, averaging approximately six images per patient) acquired under standardized conditions. Each sample includes image-level and pixel-level labels, with grain boundaries annotated by domain experts.

Type Patients Images % Total Images
Eumycetoma 80 471 54.6%
Actinomycetoma 62 393 45.4%
Total 142 864 100%

Species-level annotation further divides eumycetoma into Madurella spp. (Mspp), Madurella mycetomatis positive (MM+), Madurella mycetomatis negative (MM–), Aspergillus spp. (Aspp), and Fusarium spp. (Fspp); actinomycetoma into Actinomadura pelletieri (AMP), Actinomadura madurae (AMM), and Streptomyces somaliensis (SS). Demographic metadata covers age (10–70), sex (M:F ≈ 1.7:1), infection site (hands, feet, other), lesion size, and disease duration (Ali et al., 2024, Ali et al., 25 Dec 2025).

3. Histological Preparation, Imaging, and Annotation Protocols

All tissue samples (FNA, tru-cut, surgical biopsies) were formalin-fixed, paraffin-embedded, sectioned at 3–5 µm, and stained using standard Bancroft H&E procedures. Image acquisition utilized a Nikon Eclipse 80i microscope (10× objective, numerical aperture ≈ 0.30), configuring brightness/diaphragm optimally for grain contrast, with colour enhancement and auto white balance.

Manual annotation of regions of interest (ROI)—specifically mycetoma grains—was performed in ImageJ via polygonal segmentation, generating binary TIFF masks where 1=grain, 0=background. Protocols required the meticulous delineation of all grain boundaries, including internal voids, excluding non-grain structures unless adherent. Only one grain per image was cropped for annotation, and naming conventions consistently matched images to mask files (e.g., FM3_4.jpg with FM3_4_mask.tif). Single-expert annotation was employed throughout; inter-annotator agreement was not reported. Classification into eumycetoma or actinomycetoma was conducted by senior histopathologists (Ali et al., 2024, Ali et al., 25 Dec 2025).

4. File Structure, Metadata, and Preprocessing

MyData is organized into species-based folders and further subdivided by patient. Each subdirectory contains paired microscope images (JPEG, 800×600, 24-bit RGB) and corresponding ground-truth masks (TIFF, binary, same resolution). Comprehensive metadata is provided in CSV files, including image/mask name, patient ID, species label, grain number, sex, age, infection site, disease duration, lesion size, imaging parameters, and acquisition settings. The directory structure enables patient-wise splitting to avoid data leakage across training, validation, and test sets (standard split: 65%/15%/20%).

Recommended preprocessing steps are:

  1. Stain normalization (e.g., Macenko) to harmonize colour profiles across slides.
  2. Intensity scaling (min–max/z-score) for standardization.
  3. Region-of-interest cropping or patch extraction for deep learning models.
  4. Data augmentation (rotation, flipping, elastic deformation) to mitigate overfitting and improve model generalization (Ali et al., 2024, Ali et al., 25 Dec 2025).

5. Licensing, Access, and Ethical Compliance

MyData is released under Creative Commons Attribution (CC BY) and complies with FAIR data principles. Data access requires a formal request through the AfricAI XNAT portal or Zenodo (DOI 10.5281/zenodo.13655082), with adherence to CC BY terms and citation of the dataset paper. All data is fully anonymized with no identifying information beyond assigned patient codes.

Ethical approval for data collection was obtained from Soba University Hospital (No. SUH 05/01/2019), with written informed consent from all patients. No conflicts of interest are declared. Users must acknowledge the Mycetoma Research Centre and Soba University Hospital Ethical Committee in scholarly usage (Ali et al., 2024, Ali et al., 25 Dec 2025).

6. Use Cases, Benchmarking, and Evaluation Metrics

Intended uses of MyData include:

  • Grain detection (binary segmentation of grains versus background).
  • Grain classification (fungal vs. bacterial; finer species-level categorization).
  • Multi-class semantic segmentation (differentiating morphologically distinct grains).
  • Radiomic feature extraction for machine learning–based diagnosis.

Standard evaluation metrics are defined as follows:

Dice Coefficient:

Dice(A,B)=2ABA+B\text{Dice}(A, B) = \frac{2\,|A \cap B|}{|A| + |B|}

Intersection over Union (IoU):

IoU(A,B)=ABAB\text{IoU}(A,B) = \frac{|A \cap B|}{|A \cup B|}

A baseline radiomics + Partial Least Squares classifier achieved 91.89% accuracy in distinguishing eumycetoma from actinomycetoma. Segmentation baseline values for Dice or IoU are not reported in (Ali et al., 2024); users are expected to establish such baselines in future work. The mAIcetoma MICCAI Challenge utilized MyData for benchmarking deep learning architectures, with all finalist models achieving high segmentation accuracy for grain detection and significant performance in mycetoma type classification (Ali et al., 25 Dec 2025).

7. Significance and Research Integration

MyData constitutes the first publicly-available, expertly annotated histopathology image resource for mycetoma. It is designed to serve the research community in the development and objective evaluation of automated detection, segmentation, and classification algorithms. Through detailed annotations, structured metadata, and reproducible acquisition protocols, the dataset enables research in medical AI and computational pathology, particularly benefiting low-resource contexts where specialist expertise is scarce. Its application in the MICCAI mAIcetoma Challenge demonstrates its utility for global benchmarking and provides a standardized foundation for future method development, comparative studies, and clinical translation (Ali et al., 2024, Ali et al., 25 Dec 2025).

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Mycetoma Database (MyData).